How to Create a Custom Orchestrator Image
For 99% of DataOps pipeline jobs, the built-in orchestrator images are perfectly suitable. However, for any other use cases, it is possible to customize these standard images (e.g. to install additional packages/modules or apply customer-specific configuration) or even use a completely non-DataOps Docker image.
Customizing a DataOps orchestrator image
DataOps orchestrator images are built as standard Docker-compatible images, so applying customizations to one can be as simple as creating a Dockerfile that applies additional layers over the standard ones.
The custom image can be built in any CI/CD pipeline, including DataOps itself, and deployed to a local image registry or directly to your runners.
As the DataOps images contain proprietary code, please do not publish custom images to any public registries or other publicly-accessible locations.
Sample Dockerfile
The following Dockerfile defines an example image that upgrades Python to version 3.10 from the standard DataOps Python3 orchestrator image.
FROM dataopslive/dataops-python3-runner:5-stable
RUN add-apt-repository -y ppa:deadsnakes/ppa \
&& apt-get update \
&& apt-get install -y python3.10 \
&& ln -sf /usr/bin/python3.10 $(which python3)
Sample build job
To build the custom image, you define a job like this:
Build Python 3.10 Orchestrator Image:
extends:
- .agent_tag
image: $DATAOPS_DOCKER_RUNNER_IMAGE
stage: Image Build
variables:
CUSTOM_PYTHON310_RUNNER_IMAGE: my/custom-python310-orchestrator:5-stable
script:
- docker build -t $CUSTOM_PYTHON310_RUNNER_IMAGE $CI_PROJECT_DIR/build/custom-python310-orchestrator
- docker images
It sits in a DataOps pipeline that would only need to run if the Dockerfile (or any other scripts/content built into the custom orchestrator) was changed.
Using the custom orchestrator image
To use the custom image, leverage it in a standard DataOps pipeline like any other job, yet take note to modify the image:
key:
Python 3.10 Example Job:
extends:
- .agent_tag
image: $CUSTOM_PYTHON310_RUNNER_IMAGE
stage: Demo
script:
# Verify we run Python 3.10
- python3 --version
Note on using local images
The above examples assume images will be built and used on the same single-runner environment and not pushed to a private image registry. A configuration change is needed to the runner's config.toml
file to allow local images to be used in jobs.
...
[runners.docker]
...
pull_policy = ["always", "if-not-present"]
Using a non-DataOps Docker image
It is possible to use almost any valid Docker image as the base for a DataOps pipeline job, as long as it can run the job's script
. However, it should be noted that since non-DataOps images do not contain the /dataops
entry point or any supporting orchestrator scripts, integration with the DataOps Vault to utilize secrets is not possible. This includes using the DATAOPS_VAULT()
method to initialize variables from the vault.