Skip to main content

Pipeline Job Dependencies

Pipelines are at the heart of DataOps. Pipeline executions are visualized by stage or job dependencies in the pipeline graph. As seen in the image below, part of the entire pipeline graph visualizes all of the jobs in the DataOps pipeline, grouped by default by stage, helping you track the progress of your jobs in the order in which they will execute.

partial pipeline graph screenshot !!shadow!!

However, creating a pipeline Direct Acyclic Graph (DAG) is possible using the needs: keyword. This feature builds a DAG that starts jobs running sooner than they would if solely configured in stages. The keyword configures the order in which jobs run. One of the challenges of using needs: is that it creates ambiguity when looking at the pipeline graph.

A good example of this ambiguity is found in the pipeline graph representation of this YAML pipeline file:

show-pipeline-dependencies-ci.yml
include:
- "/pipelines/includes/config/agent_tag.yml"

data_prep_1:
extends:
- .agent_tag
stage: build
script:
- echo preparing first data

data_prep_2:
extends:
- .agent_tag
stage: build
needs: []
script:
- echo preparing second data

test:
extends:
- .agent_tag
stage: test
needs: [data_prep_1, data_prep_2]
script:
- echo test

test_transformers:
extends:
- .agent_tag
stage: test
script:
- echo test

transform_1:
extends:
- .agent_tag
stage: deploy
needs: [test_transformers]
script:
- echo transforming

transform_2:
extends:
- .agent_tag
stage: deploy
needs: [test_transformers]
script:
- echo transforming

From analyzing this script, the jobs run in the following order:

  • data_prep_2: This job runs first because it has no dependencies or needs (needs: [])
  • data_prep_1: This job runs after data_prep_2
  • test: This job won't run until both data_prep_1 and data_prep_2 have completed
  • test_transformers: This job will run after test has completed
  • transform_1: This job runs after test_transformers but won't run until test_transformers has completed
  • transform_2: This job runs after transform_1 but won't run until test_transformers has completed
note

The needs:[] or the needs: keyword with an empty array ([]) indicates that this job starts as soon as the pipeline is created.

The pipeline graph by stages is as follows:

pipeline graph by stages !!shadow!!

If we compare the pipeline graph and the text analysis of how the jobs should run, we can see that the two don't correlate. The pipeline graph based on stages does not show the running order of the jobs based on the needs: keyword, resulting in ambiguity and confusion.

How do you solve this? You can view pipelines by job dependencies when the job order is by needs and not stages. The first step is to click on the Job dependencies button as indicated in the following image:

show job dependencies !!shadow!!

You will notice that the order of jobs has changed between the jobs in this image and the previous one by stages, reducing ambiguity and confusion.

When switching Show dependencies on, additional lines on the graph link the jobs to each other, providing a visual representation of the dependencies between jobs in a pipeline.

pipeline graph with job dependencies on !!shadow!!