Skip to main content

Data Product Orchestrator

Professional
Enterprise

Image$DATAOPS_DATAPRODUCT_RUNNER_IMAGE
Feature Status
Feature release status badge: PubPrev
PubPrev

The Data Product orchestrator simplifies the generation of data products within the CI/CD process. It facilitates pipeline runs to build, test, deploy, and update data products within your DataOps project. For more information, see Data Product Pipelines.

Usage

The Data Product orchestrator enriches the data product specification with the metadata from the pipeline run. At the end of the pipeline, the orchestrator generates the data product manifest as a merged document from the specification and the data product snippets and publishes the manifest into the data product registry.

In your project directory under pipelines/includes/local_includes/ create a data_product_orchestrator.yml job file with the following structure:

data_product_orchestrator.yml
"Data Product Orchestrator":
extends:
- .agent_tag
stage: "Data Product"
image: $DATAOPS_DATAPRODUCT_ORCHESTRATOR_IMAGE
variables:
DATAOPS_DATA_PRODUCT_SPECIFICATION_FILE: dataops/data-product-definitions/my_data_product.yml
# path to the Data Product Manifest that we would like to use for backward compatibility
# DATAOPS_DATA_PRODUCT_REFERENCE_FILE: dataops/data-product-definitions/reference_manifest.yml
# DATAOPS_DATA_PRODUCT_REFERENCE_FILE_FROM_DIFFERENT_BRANCH: 1
script:
- /dataops
artifacts:
paths:
- dataops/report/data_product_manifest_viewer/
name: "Data Product Manifest Viewer"
expose_as: "dataopsreport"
icon: ${DATAOPS_ICON}
note

You can use different methods to transform data in your data products. For more information, see Creating Custom Data Products Snippets.

Then add the stage Data Product in the stages (pipelines/includes/config/stages.yml) towards the end, right before Clean Up.

pipelines/includes/config/stages.yml
stages:
- Pipeline Initialisation
- Vault Initialisation
- Snowflake Setup
- Additional Configuration
- Data Ingestion
- Source Testing
- Data Transformation
- Transformation Testing
- Generate Docs
- Data Product
- Clean Up

Finally, add the job /pipelines/includes/local_includes/data_product_orchestrator.yml to the full-ci.yml pipeline file.

full-ci.yml
include:
- /pipelines/includes/bootstrap.yml

...

- /pipelines/includes/local_includes/data_product_orchestrator.yml

Enriching the data product specification

You must add the following variables and settings to the orchestrator to enrich the data product specification with metadata from the orchestrator - the tables used in the pipeline and the test results.

  1. Use the following image for the orchestrator:

    "Data Product Orchestrator":
    image: $DATAOPS_DATAPRODUCT_RUNNER_IMAGE
  2. Add a variable that points to the source data product specification you are using to build the data product:

    "Data Product Orchestrator":
    variables:
    DATAOPS_DATA_PRODUCT_SPECIFICATION_FILE: dataops/data-product-definitions/data_product_1.yml

    Using the MATE selectors in the data product specification file, the orchestrator adds all objects and tests that are part of the data product. The orchestrator is adding the following pipeline run metadata to the source data product specification:

    commit: <the commit id>
    branch: <the branch name>
    pipeline_id: <pipeline id>
    run_start: <pipeline start datetime>
    publication_datetime: <datetime of generating the enriched data product definition>

    The MATE orchestrator extracts the relevant information from the MATE logs based on the selectors provided in the data product definition (in the dataset and SLI sections). These extracts will be added to the data product specification file in the Data Product orchestrator.

Performing backward compatibility testing

To use the orchestrator to do a backward compatibility test against an already existing data product, include the following variable in the orchestrator definition:

"Data Product Orchestrator":
variables:
DATAOPS_DATA_PRODUCT_REFERENCE_FILE: path to the reference Data Product Manifest

The job checks if the dataset and SLO sections of the produced data product match the reference data product. The job will fail if the ID and version are identical but the data product manifest has new or dropped attributes.

Supported optional parameters

The following parameters are optional for the Data Product orchestrator:

ParameterValueDescription
DATAOPS_DATA_PRODUCT_REFERENCE_FILE_FROM_DIFFERENT_BRANCH0 or 1 (default 0)If set to 1, the value for the database is skipped from the validation
DATAOPS_DATA_PRODUCT_ALLOW_BREAKING_CHANGES0 or 1 (default 0)If set to 1, instead of failing the pipeline, the orchestrator raises a warning if there is a breaking change or new attributes
DATAOPS_DATA_PRODUCT_EXCLUDE_OBJECT_ATTRIBUTES_LISTdefault columns,mate_unique_id,typecomma-separated list of the object attributes that you will exclude from the backward compatibility check
DATAOPS_DATA_PRODUCT_EXCLUDE_COLUMN_ATTRIBUTES_LISTdefault index,commentcomma-separated list of the columns attributes that you will exclude from the backward compatibility check
DATAOPS_DATA_PRODUCT_EXCLUDE_SLO_ATTRIBUTES_LISTdefault description,test_selectcomma-separated list of the SLO attributes that you will exclude from the backward compatibility check

Example jobs

Creating data products involves several basic steps, including adding the data product orchestrator to the project pipeline. Below is an example showing a typical job for building a data product.

For more information on the workflow for building data products, see the Standalone data products section.

data_product_orchestrator.yml
"Data Product Orchestrator":
extends:
- .agent_tag
stage: "Data Product"
image: $DATAOPS_DATAPRODUCT_RUNNER_IMAGE
variables:
DATAOPS_DATA_PRODUCT_SPECIFICATION_FILE: [path to the data product specification file]
DATAOPS_DATA_PRODUCT_REFERENCE_FILE: [path to the Data Product Manifest that we would like to use for backward compatability][optional]
DATAOPS_DATA_PRODUCT_REFERENCE_FILE_FROM_DIFFERENT_BRANCH: 1 [optional]
script:
- /dataops
artifacts:
paths:
- dataops/report/data_product_manifest_viewer/
name: "Data Product Manifest Viewer"
expose_as: "dataopsreport"
icon: ${DATAOPS_ICON}