Skip to main content

Force Ingestion in Development Branches

The concept of cloning the production database when creating a dev or feature branch database is powerful if the new feature work is Modelling and Transformation or using third-party technologies with Orchestrators. However, if the development work you want to do is in data ingestion, zero-copy-clone is not very helpful since the very thing you are trying to develop is not being run by default.

Consider a typical ingestion job though, here for Stitch:

pipelines/includes/local_includes/stitch-jobs/my_stitch_job.yml
"My Stitch Job":
extends:
- .agent_tag
- .should_run_ingestion
stage: "My Stage"
image: $DATAOPS_STITCH_RUNNER_IMAGE
variables:
STITCH_ACTION: START
STITCH_SOURCE_ID: XXXX
STITCH_ACCESS_TOKEN: DATAOPS_VAULT(XXXX)
JOB_NAME: my_stitch_job # Not used inside the job, but used to match FORCE_INGESTION
script:
- /dataops
icon: ${STITCH_ICON}

The default behavior given the presence of the .should_run_ingestion base job is to skip ingestion and rely on the data provided by zero-copy-clone.

Now if in your dev or feature branch you want to run ingestion, then the special pipeline execution variable FORCE_INGESTION can be used. When set as part of a pipeline execution run, the pipeline will run the job with the $JOB_NAME variable matching the FORCE_INGESTION variable.

The following image shows a typical pipeline execution run that occurs on development or a feature branch but not on production or QA:

pipeline-execution __shadow__

This run results in a pipeline where a job hasn't been selected, as follows:

pipeline-with-job-not-selected __shadow__

Setting the FORCE_INGESTION pipeline execution variable, as shown in the first image below, results in a pipeline where a job has been selected, as demonstrated in the second image:

pipeline-with-job-selected __shadow__

pipeline-with-job-selected-1 __shadow__

In summary, the result is that the ingestion job has been run even though this is not in a production or QA pipeline.