Skip to main content

dbt Cloud Orchestrator

Enterprise

Image$DATAOPS_DBT_CLOUD_RUNNER_IMAGE

The dbt Cloud orchestrator triggers dbt Cloud jobs as part of a DataOps pipeline, facilitating the run of predefined jobs within dbt Cloud. It supports up to three environments: production (PROD), staging (STAGE), and development (DEV). Each environment requires predefined dbt Cloud jobs tailored to its specific requirements.

Prerequisites

  • Access to dbt Cloud Teams or Enterprise edition. The Developer edition is not supported as it does not provide API access.

  • The predefined jobs for each environment you want to use

  • Understand the branching strategy for your DataOps pipeline.

  • Branch to Environment Mapping: The orchestrator automatically triggers the appropriate dbt Cloud job based on the current branch in your repository. By default, it follows the following mapping:

    • master or main branches > PROD environment
    • stage or qa branches > STAGE environment
    • All other branches (dev and feature branches) > DEV environment

While the default branch-to-environment mapping works for most scenarios, you can override this behavior.

Usage

pipelines/includes/local_includes/dbt_cloud_jobs/dbt_cloud.yml
"dbt Cloud Data Ingestion":
extends:
- .agent_tag
image: $DATAOPS_DBT_CLOUD_RUNNER_IMAGE
variables:
DBT_CLOUD_SERVICE_TOKEN: DATAOPS_VAULT(DBT_CLOUD.SERVICE_TOKEN)
DATAOPS_DBT_CLOUD_ACCOUNT_ID: <your account ID>
DATAOPS_DBT_CLOUD_PROD_JOB_ID: <your production job ID>
DATAOPS_DBT_CLOUD_STAGE_JOB_ID: <your staging job ID>
DATAOPS_DBT_CLOUD_DEV_JOB_ID: <your dev job ID>
DATAOPS_DBT_CLOUD_PROD_PATTERN: <pattern to match a production branch>
DATAOPS_DBT_CLOUD_STAGE_PATTERN: <pattern to match a staging branch>
DATAOPS_DBT_CLOUD_DEV_PATTERN: <pattern to match a development branch>
stage: "Data Ingestion"
script:
- /dataops
icon: ${DBT_CLOUD_ICON}

Supported parameters

ParameterRequired/DefaultDescription
DBT_CLOUD_SERVICE_TOKENREQUIREDThe token needed to authenticate to dbt Cloud
DATAOPS_DBT_CLOUD_ACCOUNT_IDREQUIREDThe dbt Cloud account identifier
DATAOPS_DBT_CLOUD_API_HOSTOptional. Defaults to https://cloud.getdbt.comThe dbt Cloud URL
DATAOPS_DBT_CLOUD_PROD_JOB_IDOptional.The dbt Cloud Job ID to use when running in the production environment
DATAOPS_DBT_CLOUD_STAGE_JOB_IDOptional.The dbt Cloud Job ID to use when running in the staging environment
DATAOPS_DBT_CLOUD_DEV_JOB_IDOptional.The dbt Cloud Job ID to use when running in the development environment
DATAOPS_DBT_CLOUD_PROD_PATTERNOptional. Defaults to master or mainA python regular expression used to match the branch name. If a match is found, the production Job will be triggered. The default value will match branches containing the name main or master.
DATAOPS_DBT_CLOUD_STAGE_PATTERNOptional. Defaults to stage or qaA python regular expression used to match the branch name. If a match is found, the staging Job will be triggered. The default value will match branches containing the name stage or qa unless they match the prod pattern.
DATAOPS_DBT_CLOUD_DEV_PATTERNOptional. Defaults to *A python regular expression used to match the branch name. If a match is found, the development Job will be triggered. The default value will match all branches not matched by the prod and stage patterns.

Using the dbt Cloud orchestrator in all environments

DataOps.live's approach to an end-to-end data pipeline is to do all of the project setup, infrastructure setup, data ingestion, data tests, data transformation, data quality, documentation, and data sharing to create value from all your data.

Within dbt Cloud, you have:

  • a production environment (explicitly marked)
  • a staging environment (in beta) (explicitly marked)
  • a development environment (the default environment)
  • feature branch environments don't exist as such. They are handled using the same database, yet a different schema named according to the naming conventions dbt_<username> inside the database.

Compare that with DataOps.live, where databases are dynamically created whenever an environment is created, and the database does not exist yet. Once the environment is no longer needed - especially for feature branches - the database gets automatically deleted. A typical setup is:

Variable DATAOPS_PREFIX: <system name>, defaults to DATAOPS

Let's review the DataOps Environments <> branch <> database mapping (default).

EnvironmentBranchDatabase
PRODmain<DATAOPS_PREFIX>_PROD
QAqa<DATAOPS_PREFIX>_QA
DEVdev<DATAOPS_PREFIX>_DEV
feature branchfeat/my_new_feature<DATAOPS_PREFIX>_FB_FEAT_MY_NEW_FEATURE

The dbt Cloud APIs can trigger a given job for a given environment, yet, there is no way to pass in arguments, e.g., variables like DBT_<MY_NAME>, to an API call that we can use to dynamically pass the database name to the dev/feature branch environment and ensure the jobs of that environment are running in the correct database. Given the current limition, let's focus on what is practical and possible today.

Both dbt Cloud and DataOps.live have three long-lived environments. Therefore, both sides can continue to use the fixed setup of database names and connections in the respective products. The table shows the summary:

Environment in DataOps.liveBranchDatabase in DataOps.liveDatabase in dbt CloudEnvironment in dbt Cloud
PRODmain<DATAOPS_PREFIX>_PROD
Created by DataOps
<DATAOPS_PREFIX>_PROD
used by dbt Cloud
PROD
QAqa<DATAOPS_PREFIX>_QA
Created by DataOps
<DATAOPS_PREFIX>_QA
used by dbt Cloud
STAGING
DEVdev<DATAOPS_PREFIX>_DEV
Created by DataOps
<DATAOPS_PREFIX>_DEV
used by dbt Cloud
Default

You must take care in your one-time setup to align the database names correctly.

For short-lived environments, typically feature branches, DataOps normally creates a dedicated database for you just for this feature branch. Doing so enables you to work on the end-to-end data pipeline, e.g., changes to ingestion or data quality steps done with third-party tools that are neither dbt Cloud nor DataOps. The default setup is:

Environment in DataOps.liveBranchDatabase in DataOps.liveDatabase in dbt CloudEnvironment in dbt Cloud
feature branchfeat/my_new_feature<DATAOPS_PREFIX>_FB_FEAT_MY_NEW_FEATURE
Created by DataOps
not used in dbt Cloud by defaultDefault

For such scenarios, you can use the parameter DATAOPS_DBT_CLOUD_DEV_PATTERN to also execute dbt Cloud jobs in the feature branch, yet be aware that it will use the default database, schema, and other connection settings. Doing so can result in concurrency issues with your peers.

Consider using dbt Cloud IDE exclusively for feature branches and do all development and testing there. If the development scope covers multiple areas of the data pipeline, you can still run a DataOps pipeline to create a database, run ingestion, and take other steps.

End-to-end testing would then only be possible in the DEV environment, not the FEATURE BRANCH environment.

Example job

Authentication

For authentication, you need to create an access token from dbt Cloud. You can find the relevant information in the dbt documentation on service tokens.

Once you have created the token, add it to your DataOps.live vault using a speaking key, e.g. DBT_CLOUD.SERVICE_TOKEN.

Example

This example sets up an integration with two jobs, one in production and one in staging.

  • For pipelines executed on the branch main, the dbt Cloud job 568984 is triggered.
  • For pipelines executed on the branch qa, the dbt Cloud job 567665 is triggered
  • For all other branches, no dbt Cloud jobs will trigger
pipelines/includes/local_includes/dbt_cloud_jobs/dbt_cloud.yml
"dbt Cloud Data Ingestion":
extends:
- .agent_tag
image: $DATAOPS_DBT_CLOUD_RUNNER_IMAGE
variables:
DBT_CLOUD_SERVICE_TOKEN: DATAOPS_VAULT(DBT_CLOUD.SERVICE_TOKEN)
DATAOPS_DBT_CLOUD_ACCOUNT_ID: 253970
DATAOPS_DBT_CLOUD_PROD_JOB_ID: 568984
DATAOPS_DBT_CLOUD_STAGE_JOB_ID: 567665
DATAOPS_DBT_CLOUD_PROD_PATTERN: main
DATAOPS_DBT_CLOUD_STAGE_PATTERN: qa
DATAOPS_DBT_CLOUD_DEV_PATTERN: null
stage: "Data Ingestion"
script:
- /dataops
icon: ${DBT_CLOUD_ICON}