Skip to main content

How to Create a Custom Orchestrator Image

For 99% of DataOps pipeline jobs, the built-in orchestrator images are perfectly suitable. However, for any other use cases, it is possible to customize these standard images (e.g. to install additional packages/modules or apply customer-specific configuration) or even use a completely non-DataOps Docker image.

Customizing a DataOps orchestrator image

DataOps orchestrator images are built as standard Docker-compatible images, so applying customizations to one can be as simple as creating a Dockerfile that applies additional layers over the standard ones.

The custom image can be built in any CI/CD pipeline, including DataOps itself, and deployed to a local image registry or directly to your runners.

Warning!

As the DataOps images contain proprietary code, please do not publish custom images to any public registries or other publicly-accessible locations.

Sample Dockerfile

The following Dockerfile defines an example image that upgrades Python to version 3.10 from the standard DataOps Python3 orchestrator image.

Dockerfile
FROM dataopslive/dataops-python3-runner:5-stable

RUN add-apt-repository -y ppa:deadsnakes/ppa \
&& apt-get update \
&& apt-get install -y python3.10 \
&& ln -sf /usr/bin/python3.10 $(which python3)

Sample build job

To build the custom image, you define a job like this:

Build Python 3.10 Orchestrator Image:
extends:
- .agent_tag
image: $DATAOPS_DOCKER_RUNNER_IMAGE
stage: Image Build
variables:
CUSTOM_PYTHON310_RUNNER_IMAGE: my/custom-python310-orchestrator:5-stable
script:
- docker build -t $CUSTOM_PYTHON310_RUNNER_IMAGE $CI_PROJECT_DIR/build/custom-python310-orchestrator
- docker images

It sits in a DataOps pipeline that would only need to run if the Dockerfile (or any other scripts/content built into the custom orchestrator) was changed.

Using the custom orchestrator image

To use the custom image, leverage it in a standard DataOps pipeline like any other job, yet take note to modify the image: key:

Python 3.10 Example Job:
extends:
- .agent_tag
image: $CUSTOM_PYTHON310_RUNNER_IMAGE
stage: Demo
script:
# Verify we run Python 3.10
- python3 --version

Note on using local images

The above examples assume images will be built and used on the same single-runner environment and not pushed to a private image registry. A configuration change is needed to the runner's config.toml file to allow local images to be used in jobs.

config.toml
...
[runners.docker]
...
pull_policy = ["always", "if-not-present"]

Using a non-DataOps Docker image

It is possible to use almost any valid Docker image as the base for a DataOps pipeline job, as long as it can run the job's script. However, it should be noted that since non-DataOps images do not contain the /dataops entry point or any supporting orchestrator scripts, integration with the DataOps Vault to utilize secrets is not possible. This includes using the DATAOPS_VAULT() method to initialize variables from the vault.