Skip to main content

DataOps Transform Orchestrator

For information on how to build dbt models, tests them, and perform other configuration, the excellent dbt training is available, so we will cover that here.

Therefore, let us look at how DataOps runs a dbt project as part of the transform job.

Typical transform job definition

The following is a standard pipeline job definition that will build all models in the project.

/pipelines/includes/local_includes/modelling_and_transformation/build_all_models.yml
Build all Models:
extends:
- .modelling_and_transformation_base
- .agent_tag
variables:
TRANSFORM_ACTION: RUN
stage: Data Transformation
script:
- /dataops
icon: ${TRANSFORM_ICON}

This job extends from a base job called .modelling_and_transformation_base, located in the DataOps Reference Project. The base job sets several internal variables and makes your job code tidier.

The variable TRANSFORM_ACTION specifies the primary function the job will perform. It sets the action that the job executes. In this case RUN, which wraps the dbt run command. Other actions include TEST and SEED. The full list can be found in the transform orchestrator documentation.

Transform actions

The following table lists all possible values for the TRANSFORM_ACTION variable, which determines the primary action your job will perform.

TRANSFORM_ACTIONEquivalent dbt CommandNotes
BUILDdbt buildNote: This action is currently in Private Preview.
Run models, test tests, take snapshots, and load seed files in Direct Acyclic Graph (DAG) order for selected resources or an entire project
RUNdbt runBuild some or all of the project's models in Snowflake
TESTdbt testExecute some or all tests against Snowflake sources or models
COMPILEdbt compileJust execute model compilation - useful for multi-stage execution
SNAPSHOTdbt snapshotBuild snapshot (type-2 SCD) models
SEEDdbt seedBy default, seed files are kept in your project's dataops/modelling/data directory
DOCSdbt docsThere is a built-in job in most pipelines for this already!
OPERATIONdbt run-operationGreat for running a dbt macro in a pipeline job
RENDERNoneThis will only render templates but not execute any dbt subcommand

Other transform job parameters

Additional configuration can be made to a transform job using other parameters, the main ones of which are listed here.

ParameterRequired/DefaultNotes
TRANSFORM_ACTIONREQUIREDSee above
TRANSFORM_MODEL_SELECTOROPTIONALSelects a subset of project models to operate on. Uses the same syntax as dbt's model selection.
TRANSFORM_OPERATION_NAMEREQUIRED if TRANSFORM_ACTION is OPERATIONSelects the name of the macro to execute
TRANSFORM_OPERATION_ARGSOPTIONAL default: {}Used with TRANSFORM_ACTION=OPERATION to provide arguments to the macro in YAML format.
FULL_REFRESHOPTIONALSet to 1 to trigger a full refresh of incremental models.

For a complete parameter reference, please see the transform orchestration documentation.