How to Use Parent-Child Pipelines

You can configure DataOps projects to trigger one pipeline from another using a technique called parent and child pipelines. The child (downstream) pipeline can reside in the same or a different project, and variables can be passed from one pipeline to another.

Why use parent-child pipelines?

The #TrueDataOps philosophy favors small, atomic pieces of code that can be reused quickly and efficiently to reduce overall prototyping and development time (from the #TrueDataOps Philosohy). To that end, each DataOps pipeline should ideally be a highly focussed unit of work, performing the minimum number of actions (jobs) to achieve its primary purpose.

However, many of the activities within a pipeline (e.g., Snowflake setup, source testing, model building, etc.) will be common across multiple pipelines, often across multiple projects, potentially leading to code duplication (DRY). The use of dedicated, shared job definitions within each project and shared from reference projects is an existing solution for avoiding the duplication of individual job configurations.

Parent and child pipelines extend this concept of encapsulation, allowing larger pipelines to be broken down into smaller, more focused units of work which can be reused and triggered from one another.

Conceptual example

Here's a screenshot of a standard DataOps pipeline straight out of the standard project template:

Standard DataOps pipeline !!shadow!!

This can be abstracted conceptually to the following equivalent diagram:

Conceptual DataOps pipeline !!shadow!!

Apart from the initial section Pipeline start-up, which will necessarily feature in every DataOps pipeline, there are two activities being performed here: an infrastructure task (maintaining Snowflake's databases, schemas, grants, etc.) and a data modeling task. However, many use cases will not require both activities to be performed together in this manner, and developers will often want to execute these activities separately when only one part of the codebase is being worked on.

We can therefore break this down into two separate pipelines, capable of being executed independently but chained together for the standard approach we see above.

Conceptual DataOps pipeline !!shadow!!

This means we can now execute both activities independently or together without duplication. But not just that. Using parent-child pipelines, we can further:

allow either pipeline to be triggered from another project's pipeline
start other pipelines (in this project or elsewhere) at the end of these pipelines
parameterize the model build/test activity and run it in parallel for different sets of models
etc.

Real-world examples

Here are some actual project configurations that can be used to set up working parent-child implementations.

Example 1: simple in-project parent-child

This sample configuration follows the conceptual example above, separating the SOLE and MATE sections into different pipelines.

Example 1 !!shadow!!

Here, we are using three separate pipeline files:

sole-ci.yml - The SOLE jobs from the standard pipeline configuration
mate-ci.yml - The MATE jobs from the standard pipeline configuration
full-ci.yml - The full parent-child pipeline implementation

It's worth noting that every pipeline file includes a reference to bootstrap.yml, ensuring that each pipeline can be executed independently, but when pipelines are combined, redundant references are automatically reduced.

sole-ci.yml
include:
  - /pipelines/includes/bootstrap.yml

  ## Snowflake Object Lifecycle jobs
  - project: "reference-template-projects/dataops-template/dataops-reference"
    ref: 5-stable
    file: "/pipelines/includes/default/snowflake_lifecycle.yml"

mate-ci.yml
include:
  - /pipelines/includes/bootstrap.yml

  ## Modelling and transformation jobs
  - /pipelines/includes/local_includes/modelling_and_transformation/test_all_sources.yml
  - /pipelines/includes/local_includes/modelling_and_transformation/build_all_models.yml
  - /pipelines/includes/local_includes/modelling_and_transformation/test_all_models.yml

  ## Generate modelling and transformation documentation
  - project: "reference-template-projects/dataops-template/dataops-reference"
    ref: 5-stable
    file: "/pipelines/includes/default/generate_modelling_and_transformation_documentation.yml"

full-ci.yml
include:
  - /pipelines/includes/bootstrap.yml
  - /sole-ci.yml

Trigger MATE:
  stage: Downstream
  inherit:
    variables: false
  trigger:
    include: mate-ci.yml
    strategy: depend

The following parent-child features have been employed in full-ci.yml's Trigger MATE job:

A new stage named Downstream is used - this must be added to the project's stages.yml file, usually towards the end of the list (will depend on your use case).
The configuration inherit:variables set to false will prevent all the parent pipeline's configurations from polluting that of the child pipeline, which will pick up its configuration as if it was executed separately.
Setting the trigger:strategy to depend will cause the parent pipeline to wait until the child pipeline completes and will report back success only if the child pipeline is successful.

Example 2: multiple child pipelines

Extending the above simple example further, we can execute more than one child pipeline in parallel. In this sample configuration, our MATE models are split in two, with one child pipeline building/testing each set.

Example 2 !!shadow!!

Here we are reusing the same sole-ci.yml and mate-ci.yml as in the previous example.

full-ci.yml
include:
  - /pipelines/includes/bootstrap.yml
  - /sole-ci.yml

Trigger MATE (Set 1):
  stage: Downstream
  inherit:
    variables: false
  variables:
    TRANSFORM_MODEL_SELECTOR: models/set1
  trigger:
    include: mate-ci.yml
    strategy: depend

Trigger MATE (Set 2):
  stage: Downstream
  inherit:
    variables: false
  variables:
    TRANSFORM_MODEL_SELECTOR: models/set2
  trigger:
    include: mate-ci.yml
    strategy: depend

Note the use of the variable TRANSFORM_MODEL_SELECTOR in each trigger job, which will be passed into each child pipeline to control the operation of all MATE jobs.

Example 3: multi-project pipelines

Instead of breaking down a project's pipelines internally, as we have done in the above examples, this sample configuration triggers the full-ci.yml pipeline in another project once the local pipeline jobs are finished.

Example 3 !!shadow!!

The parent project's full-ci.yml is shown below, but the child project's full-ci.yml needs no particular configuration for this functionality.

full-ci.yml (parent project)
include:
  - /pipelines/includes/bootstrap.yml
  - /sole-ci.yml
  - /mate-ci.yml

Trigger Project 2:
  stage: Downstream
  inherit:
    variables: false
  variables:
    _PIPELINE_FILE_NAME: full-ci.yml
  trigger:
    project: "dataops-internal/sam/parent-and-child-pipelines/project-two"
    branch: $CI_COMMIT_REF_NAME
    # strategy: depend

Please note the parent-child features that have been used here:

The parameter inherit:variables is still set to false, as this configuration has the same issue with the pollution of child pipeline variables.
To select the child project's pipeline to trigger, set the variable _PIPELINE_FILE_NAME to the pipeline filename.
Passing the built-in variable $CI_COMMIT_REF_NAME to trigger:branch will ensure that the child pipeline runs in the branch with the same name as the parent pipeline's execution.
Typically, with multi-project pipelines, we want to trigger the child project's pipeline and not wait for the result, which is why this configuration has removed the strategy parameter.

Why use parent-child pipelines?​

Conceptual example​

Real-world examples​

Example 1: simple in-project parent-child​

Example 2: multiple child pipelines​

Example 3: multi-project pipelines​