For 99% of DataOps pipeline jobs, the built-in orchestrator images are perfectly suitable. However, for any other use cases, it is possible to customize these standard images (e.g. to install additional packages/modules or apply customer-specific configuration) or even use a completely non-DataOps Docker image.
Customizing a DataOps orchestrator image
DataOps orchestrator images are built as standard Docker-compatible images, so applying customizations to one can be as simple as creating a Dockerfile that applies additional layers over the standard ones.
The custom image can be built in any CI/CD pipeline, including DataOps itself, and deployed to a local image registry or directly to your runners.
As the DataOps images contain proprietary code, please do not publish custom images to any public registries or other publicly-accessible locations.
The following Dockerfile defines an example image that upgrades Python to version 3.10 from the standard DataOps Python3 orchestrator image.
RUN add-apt-repository -y ppa:deadsnakes/ppa \
&& apt-get update \
&& apt-get install -y python3.10 \
&& ln -sf /usr/bin/python3.10 $(which python3)
Sample build job
To build the custom image, you define a job like this:
Build Python 3.10 Orchestrator Image:
stage: Image Build
- docker build -t $CUSTOM_PYTHON310_RUNNER_IMAGE $CI_PROJECT_DIR/build/custom-python310-orchestrator
- docker images
It sits in a DataOps pipeline that would only need to run if the Dockerfile (or any other scripts/content built into the custom orchestrator) was changed.
Using the custom orchestrator image
To use the custom image, leverage it in a standard DataOps pipeline like any other job, yet take note to modify the
Python 3.10 Example Job:
# Verify we run Python 3.10
- python3 --version
Note on using local images
The above examples assume images will be built and used on the same single-runner environment and not pushed to a private image registry. A configuration change is needed to the runner's
config.toml file to allow local images to be used in jobs.
pull_policy = ["always", "if-not-present"]
Using a non-DataOps Docker image
It is possible to use almost any valid Docker image as the base for a DataOps pipeline job, as long as it can run the job's
script. However, it should be noted that since non-DataOps images do not contain the
/dataops entry point or any supporting orchestrator scripts, integration with the DataOps Vault to utilize secrets is not possible. This includes using the
DATAOPS_VAULT() method to initialize variables from the vault.