Skip to main content

DataOps Project Settings

The project settings for your project are found in three key files:

/
├── pipelines/
| └── includes/
| ├── config/
| | ├── agent_tag.yml
| | ├── stages.yml
| | └── variables.yml

DataOps Runner Selection

The agent_tag.yml file defines which DataOps Runner to use as defined in the DataOps Runner Installation.

Project Variables

The project settings found in pipelines/includes/config/variables.yml are the key DataOps variables driving consistent behavior across all your pipelines.

DataOps uses a set of variables prefixed by DATAOPS_ across the entire platform.

Project variables that you can set in your project and commit to variables.yml are:

DataOps VariablesDefault ValueBehavior/Description
DATAOPS_PREFIXDATAOPSdefines the prefix used for all Snowflake objects
DATAOPS_DEBUGunsetif set provides extensive job logging potentially including credentials
DATAOPS_SOLE_DEBUGunsetif set provides extensive SOLE logging masking all credentials
DATAOPS_SOLE_WAREHOUSEunsetSnowflake warehouse to use for SOLE queries when the SOLE user has no default warehouse
DATAOPS_BRANCH_NAME_PRODmastername of the branch representing the production environment
DATAOPS_BRANCH_NAME_DEVdevname of the branch representing the development environment
DATAOPS_BRANCH_NAME_QAqaname of the branch representing the test environment
DATAOPS_ENV_NAME_PRODPRODname of the production environment
DATAOPS_ENV_NAME_DEVDEVname of the development environment
DATAOPS_ENV_NAME_QAQAname of the test environment
DATAOPS_EXTRA_BEFORE_SCRIPTSunsetlist of custom before scripts exposing additional project variables at runtime
DATAOPS_EXTRA_REFERENCE_PROJECTSunsetlist of reference projects used in addition to the DataOps Reference Project
DATAOPS_SECRETS_DIR/secretspersistent storage directory an orchestrator used for the DataOps Vault mounted from the DataOps Runner host
DATAOPS_VAULT_KEY$CI_COMMIT_SHAa partial key to encrypt the content of the DataOps Value at runtime
DATAOPS_VAULT_CONTENT$CI_PROJECT_DIR/vault-content/vault.ymlcontent of the DataOps Value at design time
DATAOPS_VAULT_SALT_FILE$DATAOPS_SECRETS_DIR/vault.saltpersistent storage location of the salt file used by the DataOps Vault as a key together with DATAOPS_VAULT_KEY mounted from the DataOps Runner host

Pipeline Variables

At pipeline execution time further DATAOPS_ variables are derived and available in your jobs.

DATAOPS_DATABASE

The variable DATAOPS_DATABASE is available at pipeline execution time. The value is ${DATAOPS_PREFIX}_${DATAOPS_ENV_NAME}".

You can access it via template rendering in your configuration files as {{ env.DATAOPS_DATABASE }}.

DATAOPS_DATABASE_MASTER

The variable DATAOPS_DATABASE_MASTER is available at pipeline execution time. The value is the name of the production database and is computed as ${DATAOPS_PREFIX}_${DATAOPS_ENV_NAME_PROD}".

You can access it via template rendering in your configuration files as {{ env.DATAOPS_DATABASE_MASTER }}.

DATAOPS_ENV_NAME

DATAOPS_ENV_NAME is computed at runtime based on the environment the pipeline runs for. The value will be one of:

  • DATAOPS_ENV_NAME_PROD,
  • DATAOPS_ENV_NAME_DEV,
  • DATAOPS_ENV_NAME_QA, or
  • FB_${branch_clean}, e.g. DATAOPS_ENV_NAME=FB_COMBINED_WF for the branch combined-wf

You can access it via template rendering in your configuration files as {{ env.DATAOPS_ENV_NAME }}.

Pipeline Stages

The default Pipeline Stages are included from the DataOps Reference Project stages.yml. The default stages represent the all the stages of execution DataOps.live has seen with customers.

The DataOps template project provides a simplified list of stages:

template project stages.yml
stages:
- Pipeline Initialisation
- Vault Initialisation
- Snowflake Setup
- Additional Configuration
- Data Ingestion
- Source Testing
- Data Transformation
- Transformation Testing
- Generate Docs
- Clean Up

There are still cases in your project where you want to define your own stage names as well as sequences of stages. Do so by modifying pipelines/includes/config/stages.yml. Using the template project stages as a starting point, you can add or remove some of the default stages and provide your own:

pipelines/includes/default/stages.yml
stages:
- Pipeline Initialisation # reserved
- Vault Initialisation # reserved
- Snowflake Setup # reserved
# - Additional Configuration
- Data Ingestion
- Source Testing
- Data Transformation
- Transformation Testing
- Publish Data # added
- Generate Docs # reserved
- Clean Up # reserved
reserved stages

Stages marked as # reserved should not be removed from the definition of the stages as the DataOps platform depends on them.

The order of the stages is important as that defines the execution order of job execution. Each stage is run sequentially after the other.

project settings stages override the reference project stages

Because pipelines/includes/config/stages.yml is included after the default base_bootstrap.yml, the pipeline will use the stage configuration from the project's includes folder rather than the reference project stages.