The DataOps Transform Orchestrator
For information on how to build dbt models, tests them and perform other configuration the excellent dbt training is available, so we will cover that here.
Therefore, let us next look at how DataOps runs a dbt project as part of the transform job.
A Typical Transform Job Definition
The following is a standard pipeline job definition that will build all models in the project.
Build all Models:
extends:
- .modelling_and_transformation_base
- .agent_tag
variables:
TRANSFORM_ACTION: RUN
stage: Data Transformation
script:
- /dataops
icon: ${TRANSFORM_ICON}
This job extends from a base job called .modelling_and_transformation_base
, located in the
DataOps reference project.
This sets a number of internal variables and generally makes your job code tidier.
The main function the job will perform is specified by the variable TRANSFORM_ACTION
. This sets
the action that the job will perform. In this case RUN
which wraps the dbt run
command.
Other actions include TEST
and SEED
, and the full list can be found in the
transform orchestrator documentation.
Transform Actions
The following table lists all possible values for the TRANSFORM_ACTION
variable, which determines
the main action your job will perform.
TRANSFORM_ACTION | Equivalent dbt Command | Notes |
---|---|---|
RUN | dbt run | Build some or all of the project's models in Snowflake |
TEST | dbt test | Execute some or all tests against Snowflake sources or models |
COMPILE | dbt compile | Just execute model compilation - useful for multi-stage execution |
SNAPSHOT | dbt snapshot | Build snapshot (type-2 SCD) models |
SEED | dbt seed | By default, seed files are kept in your project's dataops/modelling/data directory |
DOCS | dbt docs | There is a built-in job in most pipelines for this already! |
OPERATION | dbt run-operation | Great for running a dbt macro in a pipeline job |
RENDER | None | This will just render templates but not execute any dbt subcommand |
Other Transform Job Parameters
Additional configuration can be made to a transform job using other parameters, the main ones of which are listed here.
Parameter | Required/Default | Notes |
---|---|---|
TRANSFORM_ACTION | REQUIRED | See above |
TRANSFORM_MODEL_SELECTOR | OPTIONAL | Selects a subset of project models to operate on. Uses the same syntax as dbt's model selection. |
TRANSFORM_OPERATION_NAME | REQUIRED if TRANSFORM_ACTION is OPERATION | Selects the name of the macro to execute |
TRANSFORM_OPERATION_ARGS | OPTIONAL default: {} | Used with TRANSFORM_ACTION=OPERATION to provide arguments to the macro in YAML format. |
FULL_REFRESH | OPTIONAL | Set to 1 to trigger a full refresh of incremental models. |
For a full parameter reference, please see the transform orchestration documentation.