Pipeline Job Dependencies
Pipelines are at the heart of DataOps. Pipeline executions are visualized by stage or job dependencies in the pipeline graph. As seen in the image below, part of the entire pipeline graph visualizes all of the jobs in the DataOps pipeline, grouped by default by stage, helping you track the progress of your jobs in the order in which they will execute.
However, creating a pipeline Direct Acyclic Graph (DAG) is possible using the needs:
keyword. This feature builds a DAG that starts jobs running sooner than they would if solely configured in stages. The keyword configures the order in which jobs run. One of the challenges of using needs:
is that it creates ambiguity when looking at the pipeline graph.
A good example of this ambiguity is found in the pipeline graph representation of this YAML pipeline file:
include:
- "/pipelines/includes/config/agent_tag.yml"
data_prep_1:
extends:
- .agent_tag
stage: build
script:
- echo preparing first data
data_prep_2:
extends:
- .agent_tag
stage: build
needs: []
script:
- echo preparing second data
test:
extends:
- .agent_tag
stage: test
needs: [data_prep_1, data_prep_2]
script:
- echo test
test_transformers:
extends:
- .agent_tag
stage: test
script:
- echo test
transform_1:
extends:
- .agent_tag
stage: deploy
needs: [test_transformers]
script:
- echo transforming
transform_2:
extends:
- .agent_tag
stage: deploy
needs: [test_transformers]
script:
- echo transforming
From analyzing this script, the jobs run in the following order:
data_prep_2
: This job runs first because it has no dependencies or needs (needs: []
)data_prep_1
: This job runs afterdata_prep_2
test
: This job won't run until bothdata_prep_1
anddata_prep_2
have completedtest_transformers
: This job will run aftertest
has completedtransform_1
: This job runs aftertest_transformers
but won't run untiltest_transformers
has completedtransform_2
: This job runs aftertransform_1
but won't run untiltest_transformers
has completed
The needs:[]
or the needs:
keyword with an empty array ([]
) indicates that this job starts as soon as the pipeline is created.
The pipeline graph by stages is as follows:
If we compare the pipeline graph and the text analysis of how the jobs should run, we can see that the two don't correlate. The pipeline graph based on stages does not show the running order of the jobs based on the needs:
keyword, resulting in ambiguity and confusion.
How do you solve this? You can view pipelines by job dependencies when the job order is by needs
and not stages. The first step is to click on the Job dependencies button as indicated in the following image:
You will notice that the order of jobs has changed between the jobs in this image and the previous one by stages, reducing ambiguity and confusion.
When switching Show dependencies on, additional lines on the graph link the jobs to each other, providing a visual representation of the dependencies between jobs in a pipeline.