Skip to main content

Preventing DataOps Pipelines from Running Concurrently

By default, DataOps runs pipelines in parallel. Not only do pipelines run in parallel, but jobs of the same pipeline stage also run in parallel. Thus, you can effortlessly sequence jobs within a pipeline by adding them to different stages.

As a result, the question that we must ask and answer is how to sequence pipelines within different instances of the same pipeline or different pipelines.

This article answers this question by focusing on how to ensure that a new pipeline does not start before the current pipeline run has been completed.

Adding a resource group

Resource groups are a means of limiting the concurrency of DataOps jobs. We will leverage them in this example by applying the same resource group to all jobs in a given pipeline.

The resulting workflow is as follows:

  • A pipeline will start
  • A given job in the pipeline will wait until the resource of the given name is free
  • If the resource is not free, the pipeline will wait before executing the job until the resource is free

In order to achieve the desired result, we will introduce a new job, No Parallel Run, at the Pipeline Initialisation stage to ensure that the pipeline is blocked at its earliest possible stage.

In addition, we apply the resource group name sequential-pipeline to every job.

pipelines/includes/local_includes/no_parallel_run.yml
No Parallel Run:
extends:
- .agent_tag

## Recommended values for resource group names
# Using the name of the job - limiting concurrency at the job level
# Using the name of the pipeline - limiting concurrency at the pipeline level
#
# the example uses the fixed name sequential-pipeline
resource_group: sequential-pipeline
stage: Pipeline Initialisation
script:
- echo 'sequential-pipeline starting'
icon: ${UTIL_ICON}

Long Ingestion Job:
extends:
- .agent_tag
# continue to use the fixed name sequential-pipeline to prevent the long running
# ingest running concurrently
resource_group: sequential-pipeline
stage: Data Ingestion
script:
- echo 'Starting execution ...'
- sleep 60
- echo 'Completed execution ...'
icon: ${UTIL_ICON}

Clean Up:
extends:
- .agent_tag
# optional - continue to use the fixed name sequential-pipeline through all jobs
resource_group: sequential-pipeline
stage: Clean Up
script:
- echo 'sequential-pipeline done'
icon: ${UTIL_ICON}

In order to utilize these job definitions in a DataOps pipeline, like the following example, you can run two instances of the pipeline in parallel, and they will wait for each other.

include: 
- /pipelines/includes/bootstrap.yml

# sync execution
- /pipelines/includes/local_includes/no_parallel_run.yml

Observing the two pipeline executions you will see results similar to the following images:

No concurrent run, pipeline 1  __shadow__

No concurrent run, pipeline 2  __shadow__