How to Create a Custom Orchestrator Image

For 99% of DataOps pipeline jobs, the built-in orchestrator images are perfectly suitable. However, for any other use cases, it is possible to customize these standard images (e.g. to install additional packages/modules or apply customer-specific configuration) or even use a completely non-DataOps Docker image.

Customizing a DataOps orchestrator image

DataOps orchestrator images are built as standard Docker-compatible images, so applying customizations to one can be as simple as creating a Dockerfile that applies additional layers over the standard ones.

The custom image can be built in any CI/CD pipeline, including DataOps itself, and deployed to a local image registry or directly to your runners.

Warning!

As the DataOps images contain proprietary code, please do not publish custom images to any public registries or other publicly-accessible locations.

Sample Dockerfile

The following Dockerfile defines an example image that upgrades Python to version 3.10 from the standard DataOps Python3 orchestrator image.

Dockerfile
FROM dataopslive/dataops-python3-runner:5-stable

RUN add-apt-repository -y ppa:deadsnakes/ppa \
 && apt-get update \
 && apt-get install -y python3.10 \
 && ln -sf /usr/bin/python3.10 $(which python3)

Sample build job

To build the custom image, you define a job like this:

Build Python 3.10 Orchestrator Image:
  extends:
    - .agent_tag
  image: $DATAOPS_DOCKER_RUNNER_IMAGE
  stage: Image Build
  variables:
    CUSTOM_PYTHON310_RUNNER_IMAGE: my/custom-python310-orchestrator:5-stable
  script:
    - docker build -t $CUSTOM_PYTHON310_RUNNER_IMAGE $CI_PROJECT_DIR/build/custom-python310-orchestrator
    - docker images

It sits in a DataOps pipeline that would only need to run if the Dockerfile (or any other scripts/content built into the custom orchestrator) was changed.

Using the custom orchestrator image

To use the custom image, leverage it in a standard DataOps pipeline like any other job, yet take note to modify the image: key:

Python 3.10 Example Job:
  extends:
    - .agent_tag
  image: $CUSTOM_PYTHON310_RUNNER_IMAGE
  stage: Demo
  script:
    # Verify we run Python 3.10
    - python3 --version

Note on using local images

The above examples assume images will be built and used on the same single-runner environment and not pushed to a private image registry. A configuration change is needed to the runner's config.toml file to allow local images to be used in jobs.

config.toml
...
[runners.docker]
  ...
  pull_policy = ["always", "if-not-present"]

Using a non-DataOps Docker image

It is possible to use almost any valid Docker image as the base for a DataOps pipeline job, as long as it can run the job's script. However, it should be noted that since non-DataOps images do not contain the /dataops entry point or any supporting orchestrator scripts, integration with the DataOps Vault to utilize secrets is not possible. This includes using the DATAOPS_VAULT() method to initialize variables from the vault.

Customizing a DataOps orchestrator image​

Sample Dockerfile​

Sample build job​

Using the custom orchestrator image​

Note on using local images​

Using a non-DataOps Docker image​