Skip to main content

Modelling and Transformation Orchestrator

TypePre-Set
Image$DATAOPS_TRANSFORM_RUNNER_IMAGE

The Modeling and Transformation (MATE) orchestrator is a pre-set orchestrator responsible for taking the models in the /dataops/modelling directory and executing them in a Snowflake Data Warehouse by first compiling them to SQL and then running the resultant SQL statements.

Multiple operations are possible within the Modelling and Transformation Engine (MATE). To trigger the selected operation within MATE, set the parameter TRANSFORM_ACTION to one of the supported values.

Usage

The Modelling and Transformation orchestrator must always be used together with the DataOps Reference Project, providing, among others, the .modelling_and_transformation_base job.

pipelines/includes/local_includes/mate_jobs/build_all_models.yml
"Build all Models":
extends:
- .modelling_and_transformation_base
- .agent_tag
stage: "Data Transformation"
image: $DATAOPS_TRANSFORM_RUNNER_IMAGE
variables:
TRANSFORM_ACTION: RUN
script:
- /dataops
icon: ${TRANSFORM_ICON}

Supported parameters

ParameterRequired/DefaultDescription
TRANSFORM_ACTIONREQUIREDMust be one of BUILD, RUN, COMPILE, SNAPSHOT, DOCS, TEST, OPERATION, RENDER, or SEED
DATABASE_PROFILEREQUIRED, defaults to snowflake_operationsWhich dbt profile to use from profiles.template.yml
TRANSFORM_PROJECT_PATHREQUIRED, defaults to $CI_PROJECT_DIR/templates/modelingThe directory in the project structure where the base of the Modeling and Transformation project is located
TRANSFORM_OPERATION_NAMEREQUIRED, if TRANSFORM_ACTION is set to OPERATIONThe macro/operation to be executed
DATABASE_TARGETREQUIRED, defaults to otherWhich dbt profile target to use from profiles.template.yml
TRANSFORM_DEBUG_TIMINGOptional, defaults to blankWhen set, this parameter saves performance profiling information to a file timing.log in the TRANSFORM_PROJECT_PATH directory. To view the report use the tool snakevizby using the command snakeviz timing.log
TRANSFORM_EXTRA_PARAMS_AFTEROptional, defaults to blankAdditional command-line arguments to be added to the end of the transform orchestrator process
TRANSFORM_EXTRA_PARAMS_BEFOREOptional, defaults to blankAdditional command-line arguments to be added to the beginning of the transform orchestrator process
TRANSFORM_FORCE_DEPSOptional, defaults to blankDecides whether to force a refresh of external transformation libraries before execution
TRANSFORM_MODEL_SELECTOROptional, defaults to blankThe scope of overall project to execute for. And the name of a model or a tag selector, e.g., 'person' or 'tag:curation'
TRANSFORM_OPERATION_ARGSOptional, defaults to {}The YAML string representing the macro arguments, e.g. {arg1: value1, arg2: 345}
TRANSFORM_PARTIAL_PARSEOptional, defaults to blankWhen set, this parameter disables partial parsing in the project. See the section on partial parsing for more information
FULL_REFRESHOptional, defaults to blankIf set, it will force incremental models to be fully refreshed
DATAOPS_REMOVE_RENDERED_TEMPLATESOptional, defaults to blankIf set, the system will remove any templates found after processing. This allows files of the format <modelname>.template.yml to be used without creating extra models in the project.
Note: Make sure to set this variable in the variables.yml file in the project settings and not within your job.

Feature release status badge: PriPrev
PriPrev
Using TRANSFORM_ACTION: BUILD

The private preview of dbt 1.3 adds a new TRANSFORM_ACTION: BUILD, which runs the following four commands in a single job:

  • RUN
  • TEST
  • SNAPSHOT
  • SEED

It follows the Directed Acyclic Graph (DAG) order for selected resources or the entire project. The BUILD action allows you to combine jobs spread across multiple stages into one.

To benefit from this feature, you must enable the dbt 1.3 private preview in your branch by adding the following variable to the project variables:

pipelines/includes/config/variables.yml
variables:
DATAOPS_RUNNER_IMAGE_TAG: pripre-dbt1.3

The subsequent pipeline on the branch will use the orchestrator version, which has the dbt 1.3 preview enabled.

The following example shows how to run BUILD for all models, snapshots, and seeds tagged with finance:

pipelines/includes/local_includes/mate_jobs/build_tagged_models.yml
"Build all Models":
extends:
- .modeling_and_transformation_base
- .agent_tag
stage: "Data Transformation"
image: $DATAOPS_TRANSFORM_RUNNER_IMAGE
variables:
TRANSFORM_ACTION: BUILD
TRANSFORM_MODEL_SELECTOR: tag:finance
script:
- /dataops
icon: ${TRANSFORM_ICON}

Learn more about the dbt 1.3 private preview.

The following details bear reference to and expand on several of these supported parameters:

Testing reporting

The test reporting feature is broken further down into the following categories:

1. Enable test reporting

The Transform orchestrator generates a test report when running a TEST job. To surface this report into the DataOps platform, the job must include an artifact around it, as in lines 11 to 14 in the following example:

pipelines/includes/local_includes/mate_jobs/my_test_job.yml
"My Test Job":
extends:
- .modeling_and_transformation_base
- .agent_tag
stage: "Transformation Testing"
image: $DATAOPS_TRANSFORM_RUNNER_IMAGE
variables:
TRANSFORM_ACTION: TEST
script:
- /dataops
artifacts:
when: always
reports:
junit: $CI_PROJECT_DIR/report.xml
icon: ${TRANSFORM_ICON}

Once the job has run successfully, the job will add the test results to the pipeline's Tests tab.

Pipeline tests tab __shadow__

2. Test states

It is possible to categorize the test result or state test in one of the following ways:

StateDescriptionExample
PASSThe test executed and passedThe test is successful. The target fulfilled its specified condition, such as the correct number of rows present in a database table
FAILThe test executed and failedThe test failed. The target did not meet its specified condition, such as the incorrect number of rows present in a table
ERRORError while executing a testAn error occurred while executing a test, such as an invalid column/table name specified in a table/dataset
SKIPPEDThe test skipped due to an unfulfilled conditionThe test did not execute because of an unfilled condition in a dataset/table

3. Test report control

The following parameters control the generation of test reports:

ParameterRequired/OptionalDescription
JSON_PATHOptional, defaults to $TRANSFORM_PROJECT_PATH/target/run_results.jsonThe path to the JSON result generated by the Transform orchestrator
REPORT_NAMEOptional, defaults to report.xmlThe generated report name
REPORT_DIROptional, defaults to $CI_PROJECT_DIRThe path where the generated report is saved. Note: This directory must already exist before these tests run
TREAT_TEST_ERRORS_AS_FAILEDOptional, defaults to FALSEIf enabled, it reports a test error as FAIL. See Test state for more information
TREAT_TEST_WARNS_AS_FAILEDOptional, defaults to FALSEIf enabled, it reports a test warning as FAIL. See Test state for more information

Partial parse

Partial parsing can improve the performance characteristics of DataOps pipeline runs by limiting the number of files a pipeline must parse every time it runs. Here, "parsing" means reading files in a project from disk and capturing ref() and config() method calls. These method calls are used to determine the following:

  • The shape of the dbt DAG (Direct Acyclic Graph)
  • The supplied resource configurations

There is no need to re-parse these files if partial parsing is enabled and the files are unchanged between job requests. The Transform orchestrator can use the parsed representation from the last job request. However, if a file has changed between invocations, then the orchestrator will re-parse the file and update the parsed node cache accordingly.

TRANSFORM_PARTIAL_PARSE is enabled by default. To disable this feature, set its value to 1.

To utilize partial parsing in a DataOps project, enable caching in the .modeling_and_transformation_base job by overriding the settings from the reference project base job by creating a definition in your project as follows:

pipelines/includes/local_includes/mate_jobs/base_job_modelling_and_transformation.yml
.modeling_and_transformation_base:
image: $DATAOPS_TRANSFORM_RUNNER_IMAGE
variables:
DATAOPS_TEMPLATES_DIR: /tmp/local_config
DATAOPS_SECONDARY_TEMPLATES_DIR: $CI_PROJECT_DIR/dataops/modeling
cache:
key: $CI_PIPELINE_ID
paths:
- dataops/modeling/target/
icon: ${TRANSFORM_ICON}

Defining a custom profiles.template.yml

Users can customize profiles.template.yml in the /dataops/profiles project directory to be used by the MATE orchestrator. If no profiles.template.yml is defined in /dataops/profiles, then a default one will be used.

note

The rendered file /dataops/profiles/profiles.yml will be removed after the /dataops script call to prevent exposing credentials. Users can set the parameter PREVENT_KEY_PAIR_CLEANUP to 1 to prevent the removal. Setting this will also avoid cleaning up key-pair files (if in use).

caution

The /dataops/profiles/profiles.yml contains all the details required to connect to your Snowflake account, including sensitive information like credentials.

One can set up DATABASE_PROFILE and DATABASE_TARGET to execute macro with higher privilege. See Running a macro using the SOLE Admin role

The default profiles.template.yml

The default profiles.template.yml is provided below. It defines the default DATABASE_PROFILE named snowflake_operations and the default DATABASE_TARGET named other.

/dataops/profiles/profiles.template.yml
snowflake_operations:
target: other
outputs:
other:
type: snowflake
account: {{ SNOWFLAKE.ACCOUNT }}
user: {{ SNOWFLAKE.TRANSFORM.USERNAME }}
role: {{ SNOWFLAKE.TRANSFORM.ROLE }}
{% if env.DATAOPS_SNOWFLAKE_AUTH %}
{% if env.DATAOPS_SNOWFLAKE_AUTH == "KEY_PAIR" %}
private_key_path: {{ env.DATAOPS_SNOWFLAKE_KEY_PAIR_PATH }}
{% if env.DATAOPS_SNOWFLAKE_PASSPHRASE %}
private_key_passphrase: {{ env.DATAOPS_SNOWFLAKE_PASSPHRASE }}
{% endif %}
{% else %}
password: {{ SNOWFLAKE.TRANSFORM.PASSWORD }}
{% endif %}
{% else %}
password: {{ SNOWFLAKE.TRANSFORM.PASSWORD }}
{% endif %}
database: {{ env.DATABASE }}
schema: BASESCHEMA
warehouse: {{ SNOWFLAKE.TRANSFORM.WAREHOUSE }}
threads: {{ SNOWFLAKE.TRANSFORM.THREADS }}
client_session_keep_alive: False
qa:
type: snowflake
account: {{ SNOWFLAKE.ACCOUNT }}
user: {{ SNOWFLAKE.TRANSFORM.USERNAME }}
role: {{ SNOWFLAKE.TRANSFORM.ROLE }}
{% if env.DATAOPS_SNOWFLAKE_AUTH %}
{% if env.DATAOPS_SNOWFLAKE_AUTH == "KEY_PAIR" %}
private_key_path: {{ env.DATAOPS_SNOWFLAKE_KEY_PAIR_PATH }}
{% if env.DATAOPS_SNOWFLAKE_PASSPHRASE %}
private_key_passphrase: {{ env.DATAOPS_SNOWFLAKE_PASSPHRASE }}
{% endif %}
{% else %}
password: {{ SNOWFLAKE.TRANSFORM.PASSWORD }}
{% endif %}
{% else %}
password: {{ SNOWFLAKE.TRANSFORM.PASSWORD }}
{% endif %}
database: {{ env.DATABASE }}
schema: BASESCHEMA
warehouse: {{ SNOWFLAKE.TRANSFORM.WAREHOUSE }}
threads: {{ SNOWFLAKE.TRANSFORM.THREADS }}
client_session_keep_alive: False
prod:
type: snowflake
account: {{ SNOWFLAKE.ACCOUNT }}
user: {{ SNOWFLAKE.TRANSFORM.USERNAME }}
role: {{ SNOWFLAKE.TRANSFORM.ROLE }}
{% if env.DATAOPS_SNOWFLAKE_AUTH %}
{% if env.DATAOPS_SNOWFLAKE_AUTH == "KEY_PAIR" %}
private_key_path: {{ env.DATAOPS_SNOWFLAKE_KEY_PAIR_PATH }}
{% if env.DATAOPS_SNOWFLAKE_PASSPHRASE %}
private_key_passphrase: {{ env.DATAOPS_SNOWFLAKE_PASSPHRASE }}
{% endif %}
{% else %}
password: {{ SNOWFLAKE.TRANSFORM.PASSWORD }}
{% endif %}
{% else %}
password: {{ SNOWFLAKE.TRANSFORM.PASSWORD }}
{% endif %}
database: {{ env.DATABASE }}
schema: BASESCHEMA
warehouse: {{ SNOWFLAKE.TRANSFORM.WAREHOUSE }}
threads: {{ SNOWFLAKE.TRANSFORM.THREADS }}
client_session_keep_alive: False

snowflake_ingestion:
outputs:
default:
type: snowflake
account: {{ SNOWFLAKE.ACCOUNT }}
user: {{ SNOWFLAKE.INGESTION.USERNAME }}
role: {{ SNOWFLAKE.INGESTION.ROLE }}
{% if env.DATAOPS_SNOWFLAKE_AUTH %}
{% if env.DATAOPS_SNOWFLAKE_AUTH == "KEY_PAIR" %}
private_key_path: {{ env.DATAOPS_SNOWFLAKE_KEY_PAIR_PATH }}
{% if env.DATAOPS_SNOWFLAKE_PASSPHRASE %}
private_key_passphrase: {{ env.DATAOPS_SNOWFLAKE_PASSPHRASE }}
{% endif %}
{% else %}
password: {{ SNOWFLAKE.INGESTION.PASSWORD }}
{% endif %}
{% else %}
password: {{ SNOWFLAKE.INGESTION.PASSWORD }}
{% endif %}
database: {{ DATABASE }}
schema: BASESCHEMA
warehouse: {{ SNOWFLAKE.INGESTION.WAREHOUSE }}
threads: {{ SNOWFLAKE.INGESTION.THREADS }}
client_session_keep_alive: False


snowflake_master:
outputs:
default:
type: snowflake
account: {{ SNOWFLAKE.ACCOUNT }}
user: {{ SNOWFLAKE.MASTER.USERNAME }}
role: {{ SNOWFLAKE.MASTER.ROLE }}
{% if env.DATAOPS_SNOWFLAKE_AUTH %}
{% if env.DATAOPS_SNOWFLAKE_AUTH == "KEY_PAIR" %}
private_key_path: {{ env.DATAOPS_SNOWFLAKE_KEY_PAIR_PATH }}
{% if env.DATAOPS_SNOWFLAKE_PASSPHRASE %}
private_key_passphrase: {{ env.DATAOPS_SNOWFLAKE_PASSPHRASE }}
{% endif %}
{% else %}
password: {{ SNOWFLAKE.MASTER.PASSWORD }}
{% endif %}
{% else %}
password: {{ SNOWFLAKE.MASTER.PASSWORD }}
{% endif %}
database: {{ env.DATABASE }}
schema: BASESCHEMA
warehouse: {{ SNOWFLAKE.TRANSFORM.WAREHOUSE }}
threads: {{ SNOWFLAKE.TRANSFORM.THREADS }}
client_session_keep_alive: False

config:
send_anonymous_usage_stats: False

Migrating to the /dataops/profiles directory

If you had customized the profiles.template.yml in the DataOps Runner host directory /app, you need to copy the content to the new project git directory /dataops/profiles. To leverage key pair authentication, start from the default profiles.template.yml and merge your modifications. In most cases, using the default file is sufficient.

If no profile file named profiles.template.yml is present in the project directory /dataops/profiles then the default will be used.

Authentication

Key pair authentication

MATE orchestrator supports using key pair authentication. To know more and how to configure it, see Key Pair Authentication.

Example jobs

The examples below extend the base job .modeling_and_transformation_base to simplify the MATE job definition. See the reference project base job for all details about these examples.

For ease of reading, the examples below are summarized as follows:

  1. Build All Models
  2. Build a Directory of Models
  3. Build Tagged Models
  4. Test all Models
  5. Build Models by Running a Macro
  6. Run a Macro using the SOLE Admin Role

1. Build all models

Build all the models in your project:

pipelines/includes/local_includes/mate_jobs/build_all_models.yml
"Build all Models":
extends:
- .modeling_and_transformation_base
- .agent_tag
stage: "Data Transformation"
image: $DATAOPS_TRANSFORM_RUNNER_IMAGE
variables:
TRANSFORM_ACTION: RUN
script:
- /dataops
icon: ${TRANSFORM_ICON}

2. Build a directory of models

Build all the models in the divisions/finance directory:

pipelines/includes/local_includes/mate_jobs/build_all_models.yml
"Build all Models":
extends:
- .modelling_and_transformation_base
- .agent_tag
stage: "Data Transformation"
image: $DATAOPS_TRANSFORM_RUNNER_IMAGE
variables:
TRANSFORM_ACTION: RUN
TRANSFORM_MODEL_SELECTOR: divisions/finance
script:
- /dataops
icon: ${TRANSFORM_ICON}

3. Build tagged models

Build all the models tagged finance:

pipelines/includes/local_includes/mate_jobs/build_all_models.yml
"Build all Models":
extends:
- .modeling_and_transformation_base
- .agent_tag
stage: "Data Transformation"
image: $DATAOPS_TRANSFORM_RUNNER_IMAGE
variables:
TRANSFORM_ACTION: RUN
TRANSFORM_MODEL_SELECTOR: tag:finance
script:
- /dataops
icon: ${TRANSFORM_ICON}

4. Test all Models

The TRANSFORM_MODEL_SELECTOR variable works the same way with TEST as it does with RUN.

pipelines/includes/local_includes/mate_jobs/test_all_models.yml
"Test all Models":
extends:
- .modeling_and_transformation_base
- .agent_tag
stage: "Transformation Testing"
image: $DATAOPS_TRANSFORM_RUNNER_IMAGE
variables:
TRANSFORM_ACTION: TEST
script:
- /dataops
artifacts:
when: always
reports:
junit: $CI_PROJECT_DIR/report.xml
icon: ${TESTING_ICON}

5. Build models by running a macro

Rather than building/testing all or part of the MATE models, a MATE job can also execute a standalone macro as its primary operation.

pipelines/includes/local_includes/mate_jobs/my_mate_job.yml
Run My Macro:
extends:
- .modeling_and_transformation_base
- .agent_tag
stage: "Additional Configuration"
image: $DATAOPS_TRANSFORM_RUNNER_IMAGE
variables:
TRANSFORM_ACTION: OPERATION
TRANSFORM_OPERATION_NAME: my_macro
script:
- /dataops
icon: ${TRANSFORM_ICON}

6. Run a macro using the SOLE Admin role

Setting DATABASE_PROFILE and DATABASE_TARGET to the values snowflake_master and default, respectively (as per the example below) will execute the macro using the higher privileges that SOLE uses.

pipelines/includes/local_includes/mate_jobs/my_mate_job.yml
"Run My Macro as Admin":
extends:
- .modeling_and_transformation_base
- .agent_tag
stage: "Additional Configuration"
image: $DATAOPS_TRANSFORM_RUNNER_IMAGE
variables:
TRANSFORM_ACTION: OPERATION
TRANSFORM_OPERATION_NAME: my_macro
DATABASE_PROFILE: snowflake_master
DATABASE_TARGET: default
script:
- /dataops
icon: ${TRANSFORM_ICON}