Skip to main content

Global Project Settings

The global settings for your project are in three essential files:

/
├── pipelines/
| └── includes/
| ├── config/
| | ├── agent_tag.yml
| | ├── stages.yml
| | └── variables.yml

DataOps runner selection with agent_tag

The agent_tag.yml file defines which DataOps runner to use for your pipeline. Choosing and configuring your runner tag is described in the DataOps Docker Runner Installation or the DataOps Kubernetes Runner Installation, respectively.

Project variables

The project settings in pipelines/includes/config/variables.yml contain the key DataOps variables driving consistent behavior across all your pipelines.

DataOps uses a set of variables prefixed by DATAOPS_ across the entire data product platform.

The variables you can set in your project and commit to variables.yml are listed in the table below.

note

You can also add transformation parameters to the project variables.yml file to apply MATE operations at the project level, i.e., to all jobs in the pipeline.

DataOps VariablesDefault ValueBehavior/Description
DATAOPS_PREFIXDATAOPSDefines the prefix used for all Snowflake objects
DATAOPS_DEBUGunsetIf set, provides extensive job logging with confidential secret values being masked
DATAOPS_SOLE_DEBUGunsetIf set, provides extensive SOLE logging masking all credentials
DATAOPS_SOLE_WAREHOUSEunsetSnowflake warehouse to use for SOLE queries when the SOLE user has no default warehouse
DATAOPS_BRANCH_NAME_PRODmainName of the branch representing the production environment
DATAOPS_BRANCH_NAME_DEVdevName of the branch representing the development environment
DATAOPS_BRANCH_NAME_QAqaName of the branch representing the test environment
DATAOPS_ENV_NAME_PRODPRODName of the production environment
DATAOPS_ENV_NAME_DEVDEVName of the development environment
DATAOPS_ENV_NAME_QAQAName of the test environment
DATAOPS_FEATURE_BRANCH_NONDB_ENV_NAMEunsetUse this environment name for all feature branches (except for the default DataOps database)
DATAOPS_EXTRA_BEFORE_SCRIPTSunsetList of custom before scripts exposing additional project variables at runtime
DATAOPS_EXTRA_REFERENCE_PROJECTSunsetList of reference projects used in addition to the DataOps Reference Project
DATAOPS_PREVENT_OBJECT_DELETIONunsetIf set, you delete objects only by setting the deleted attribute on them.
Caution: Running a pipeline with LIFECYCLE_STATE_RESET ignores the restriction on deleting objects set by DATAOPS_PREVENT_OBJECT_DELETION.
DATAOPS_SECRETS_DIR/secretsPersistent storage directory an orchestrator used for the DataOps Vault mounted from the DataOps Runner host
DATAOPS_VAULT_KEY$CI_COMMIT_SHAPartial key to encrypt the content of the DataOps Value at runtime
DATAOPS_VAULT_CONTENT$CI_PROJECT_DIR/vault-content/vault.ymlContent of the DataOps Value at design time
DATAOPS_VAULT_SALT_FILE$DATAOPS_SECRETS_DIR/vault.saltPersistent storage location of the salt file used by the DataOps Vault as a key together with DATAOPS_VAULT_KEY mounted from the DataOps Runner host
DATAOPS_ENABLE_BEHAVIOR_CHANGE_BUNDLEunsetIf set, you activate all the features under the flag, including behavioral changes that may break backward compatibility. Your set value must follow the format <year_month> corresponding to the release month. For example, 2023_08. See Bundled Feature Flags for more information.

Pipeline variables

At pipeline execution time, further DATAOPS_ variables are derived and available in your jobs.

DATAOPS_DATABASE

The variable DATAOPS_DATABASE is available at pipeline execution time. The value is ${DATAOPS_PREFIX}_${DATAOPS_ENV_NAME}".

You can access it via template rendering in your configuration files as {{ env.DATAOPS_DATABASE }}.

DATAOPS_DATABASE_MASTER

The variable DATAOPS_DATABASE_MASTER is available at pipeline execution time. The value is the name of the production database and is computed as ${DATAOPS_PREFIX}_${DATAOPS_ENV_NAME_PROD}".

You can access it via template rendering in your configuration files as {{ env.DATAOPS_DATABASE_MASTER }}.

DATAOPS_ENV_NAME

DATAOPS_ENV_NAME is computed at runtime based on the environment the pipeline runs for. The value will be one of:

  • DATAOPS_ENV_NAME_PROD,
  • DATAOPS_ENV_NAME_DEV,
  • DATAOPS_ENV_NAME_QA, or
  • FB_${branch_clean}, e.g. DATAOPS_ENV_NAME=FB_COMBINED_WF for the branch combined-wf

You can access it via template rendering in your configuration files as {{ env.DATAOPS_ENV_NAME }}.

DATAOPS_NONDB_ENV_NAME

This can be used interchangeably with DATAOPS_ENV_NAME to namespace account level Snowflake objects (apart from the default DataOps database) into a pre-existing environment, avoiding a proliferation of roles, warehouses, etc. in feature branch environments. Instead, feature branches can reuse these resources from an existing environment.

If the configuration variable DATAOPS_FEATURE_BRANCH_NONDB_ENV_NAME is set to the name of an environment (typically DEV), the variable DATAOPS_NONDB_ENV_NAME will be set to that name for all feature branch environments (i.e. not PROD, QA or DEV).

Example Usage

If DATAOPS_FEATURE_BRANCH_NONDB_ENV_NAME is set to DEV (and branch/environment names use the DataOps default values):

BranchDATAOPS_ENV_NAMEDATAOPS_NONDB_ENV_NAME
mainPRODPROD
qaQAQA
devDEVDEV
feature1FB_FEATURE1DEV

Pipeline stages

The default Pipeline Stages are included from the DataOps Reference Project stages.yml. The default stages represent the all the stages of execution DataOps.live has seen with customers.

The DataOps template project provides a simplified list of stages:

template project stages.yml
stages:
- Pipeline Initialisation
- Vault Initialisation
- Snowflake Setup
- Additional Configuration
- Data Ingestion
- Source Testing
- Data Transformation
- Transformation Testing
- Generate Docs
- Clean Up

There are still cases in your project where you want to define your own stage names as well as sequences of stages. Do so by modifying pipelines/includes/config/stages.yml. Using the template project stages as a starting point, you can add or remove some of the default stages and provide your own:

pipelines/includes/default/stages.yml
stages:
- Pipeline Initialisation # reserved
- Vault Initialisation # reserved
- Snowflake Setup # reserved
# - Additional Configuration
- Data Ingestion
- Source Testing
- Data Transformation
- Transformation Testing
- Publish Data # added
- Generate Docs # reserved
- Clean Up # reserved
reserved stages

Stages marked as # reserved should not be removed from the definition of the stages as the data product platform depends on them.

The order of the stages is important as that defines the execution order of job execution. Each stage is run sequentially after the other.

project settings stages override the reference project stages

Because pipelines/includes/config/stages.yml is included after the default base_bootstrap.yml, the pipeline will use the stage configuration from the project's includes folder rather than the reference project stages.