Skip to main content

The DataOps Transform Orchestrator

For information on how to build dbt models, tests them and perform other configuration the excellent dbt training is available, so we will cover that here.

Therefore, let us next look at how DataOps runs a dbt project as part of the transform job.

A Typical Transform Job Definition

The following is a standard pipeline job definition that will build all models in the project.

/pipelines/includes/local_includes/modelling_and_transformation/build_all_models.yml
Build all Models:
extends:
- .modelling_and_transformation_base
- .agent_tag
variables:
TRANSFORM_ACTION: RUN
stage: Data Transformation
script:
- /dataops
icon: ${TRANSFORM_ICON}

This job extends from a base job called .modelling_and_transformation_base, located in the DataOps reference project. This sets a number of internal variables and generally makes your job code tidier.

The main function the job will perform is specified by the variable TRANSFORM_ACTION. This sets the action that the job will perform. In this case RUN which wraps the dbt run command. Other actions include TEST and SEED, and the full list can be found in the transform orchestrator documentation.

Transform Actions

The following table lists all possible values for the TRANSFORM_ACTION variable, which determines the main action your job will perform.

TRANSFORM_ACTIONEquivalent dbt CommandNotes
RUNdbt runBuild some or all of the project's models in Snowflake
TESTdbt testExecute some or all tests against Snowflake sources or models
COMPILEdbt compileJust execute model compilation - useful for multi-stage execution
SNAPSHOTdbt snapshotBuild snapshot (type-2 SCD) models
SEEDdbt seedBy default, seed files are kept in your project's dataops/modelling/data directory
DOCSdbt docsThere is a built-in job in most pipelines for this already!
OPERATIONdbt run-operationGreat for running a dbt macro in a pipeline job
RENDERNoneThis will just render templates but not execute any dbt subcommand

Other Transform Job Parameters

Additional configuration can be made to a transform job using other parameters, the main ones of which are listed here.

ParameterRequired/DefaultNotes
TRANSFORM_ACTIONREQUIREDSee above
TRANSFORM_MODEL_SELECTOROPTIONALSelects a subset of project models to operate on. Uses the same syntax as dbt's model selection.
TRANSFORM_OPERATION_NAMEREQUIRED if TRANSFORM_ACTION is OPERATIONSelects the name of the macro to execute
TRANSFORM_OPERATION_ARGSOPTIONAL default: {}Used with TRANSFORM_ACTION=OPERATION to provide arguments to the macro in YAML format.
FULL_REFRESHOPTIONALSet to 1 to trigger a full refresh of incremental models.

For a full parameter reference, please see the transform orchestration documentation.