How to Prevent DataOps Pipelines from Running Concurrently
By default, DataOps runs pipelines in parallel. Not only do pipelines run in parallel, but jobs of the same pipeline stage also run in parallel. Thus, you can effortlessly sequence jobs within a pipeline by adding them to different stages.
As a result, the question that we must ask and answer is how to sequence pipelines within different instances of the same pipeline or different pipelines.
This article answers this question by focusing on how to ensure that a new pipeline does not start before the current pipeline run has been completed.
Adding a resource group
Resource groups are a means of limiting the concurrency of DataOps jobs. We will leverage them in this example by applying the same resource group to all jobs in a given pipeline.
The resulting workflow is as follows:
- A pipeline will start
- A given job in the pipeline will wait until the resource of the given name is free
- If the resource is not free, the pipeline will wait before executing the job until the resource is free
In order to achieve the desired result, we will introduce a new job,
No Parallel Run, at the
Pipeline Initialisation stage to ensure that the pipeline is blocked at its earliest possible stage.
In addition, we apply the resource group name
sequential-pipeline to every job.
No Parallel Run:
## Recommended values for resource group names
# Using the name of the job - limiting concurrency at the job level
# Using the name of the pipeline - limiting concurrency at the pipeline level
# the example uses the fixed name sequential-pipeline
stage: Pipeline Initialisation
- echo 'sequential-pipeline starting'
Long Ingestion Job:
# continue to use the fixed name sequential-pipeline to prevent the long running
# ingest running concurrently
stage: Data Ingestion
- echo 'Starting execution ...'
- sleep 60
- echo 'Completed execution ...'
# optional - continue to use the fixed name sequential-pipeline through all jobs
stage: Clean Up
- echo 'sequential-pipeline done'
In order to utilize these job definitions in a DataOps pipeline, like the following example, you can run two instances of the pipeline in parallel, and they will wait for each other.
# sync execution
Observing the two pipeline executions you will see results similar to the following images: