DataOps Transform Orchestrator
For information on how to build dbt models, tests them, and perform other configuration, the excellent dbt training is available, so we will cover that here.
Therefore, let us look at how DataOps runs a dbt project as part of the transform job.
Typical transform job definition
The following is a standard pipeline job definition that will build all models in the project.
Build all Models:
extends:
- .modelling_and_transformation_base
- .agent_tag
variables:
TRANSFORM_ACTION: RUN
stage: Data Transformation
script:
- /dataops
icon: ${TRANSFORM_ICON}
This job extends from a base job called .modelling_and_transformation_base
, located in the
DataOps Reference Project.
The base job sets several internal variables and makes your job code tidier.
The variable TRANSFORM_ACTION
specifies the primary function the job will perform. It sets
the action that the job executes. In this case RUN
, which wraps the dbt run
command.
Other actions include TEST
and SEED
. The full list can be found in the
transform orchestrator documentation.
Transform actions
The following table lists all possible values for the TRANSFORM_ACTION
variable, which determines
the primary action your job will perform.
TRANSFORM_ACTION | Equivalent dbt Command | Notes |
---|---|---|
BUILD | dbt build | Note: This action is currently in Private Preview. Run models, test tests, take snapshots, and load seed files in Direct Acyclic Graph (DAG) order for selected resources or an entire project |
RUN | dbt run | Build some or all of the project's models in Snowflake |
TEST | dbt test | Execute some or all tests against Snowflake sources or models |
COMPILE | dbt compile | Just execute model compilation - useful for multi-stage execution |
SNAPSHOT | dbt snapshot | Build snapshot (type-2 SCD) models |
SEED | dbt seed | By default, seed files are kept in your project's dataops/modelling/data directory |
DOCS | dbt docs | There is a built-in job in most pipelines for this already! |
OPERATION | dbt run-operation | Great for running a dbt macro in a pipeline job |
RENDER | None | This will only render templates but not execute any dbt subcommand |
Other transform job parameters
Additional configuration can be made to a transform job using other parameters, the main ones of which are listed here.
Parameter | Required/Default | Notes |
---|---|---|
TRANSFORM_ACTION | REQUIRED | See above |
TRANSFORM_MODEL_SELECTOR | OPTIONAL | Selects a subset of project models to operate on. Uses the same syntax as dbt's model selection. |
TRANSFORM_OPERATION_NAME | REQUIRED if TRANSFORM_ACTION is OPERATION | Selects the name of the macro to execute |
TRANSFORM_OPERATION_ARGS | OPTIONAL default: {} | Used with TRANSFORM_ACTION=OPERATION to provide arguments to the macro in YAML format. |
FULL_REFRESH | OPTIONAL | Set to 1 to trigger a full refresh of incremental models. |
For a complete parameter reference, please see the transform orchestration documentation.