Global Project Settings

The global settings for your project are in three essential files:

/
├── pipelines/
|  └── includes/
|     ├── config/
|     |   ├── agent_tag.yml
|     |   ├── stages.yml
|     |   └── variables.yml

agent_tag.yml drives the DataOps runner selection
stages.yml defines the well know pipeline stages
variables.yml defines all global project variables

DataOps runner selection with `agent_tag`

The agent_tag.yml file defines which DataOps runner to use for your pipeline. Choosing and configuring your runner tag is described in the DataOps Docker Runner Installation or the DataOps Kubernetes Runner Installation, respectively.

Project variables

The project settings in pipelines/includes/config/variables.yml contain the key DataOps variables driving consistent behavior across all your pipelines.

DataOps uses a set of variables prefixed by DATAOPS_ across the entire data product platform.

The variables you can set in your project and commit to variables.yml are listed in the table below.

note

You can also add transformation parameters to the project variables.yml file to apply MATE operations at the project level, i.e., to all jobs in the pipeline.

DataOps Variables	Default Value	Behavior/Description
`DATAOPS_PREFIX`	`DATAOPS`	Defines the prefix used for all Snowflake objects
`DATAOPS_DEBUG`	unset	If set, provides extensive job logging with confidential secret values being masked
`DATAOPS_SOLE_DEBUG`	unset	If set, provides extensive SOLE logging masking all credentials
`DATAOPS_SOLE_WAREHOUSE`	unset	Snowflake warehouse to use for SOLE queries when the SOLE user has no default warehouse
`DATAOPS_BRANCH_NAME_PROD`	`main`	Name of the branch representing the production environment
`DATAOPS_BRANCH_NAME_DEV`	`dev`	Name of the branch representing the development environment
`DATAOPS_BRANCH_NAME_QA`	`qa`	Name of the branch representing the test environment
`DATAOPS_ENV_NAME_PROD`	`PROD`	Name of the production environment
`DATAOPS_ENV_NAME_DEV`	`DEV`	Name of the development environment
`DATAOPS_ENV_NAME_QA`	`QA`	Name of the test environment
`DATAOPS_FEATURE_BRANCH_NONDB_ENV_NAME`	unset	Use this environment name for all feature branches (except for the default DataOps database)
`DATAOPS_EXTRA_BEFORE_SCRIPTS`	unset	List of custom before scripts exposing additional project variables at runtime
`DATAOPS_EXTRA_REFERENCE_PROJECTS`	unset	List of reference projects used in addition to the DataOps Reference Project
`DATAOPS_PREVENT_OBJECT_DELETION`	set	By default, objects can only be deleted by setting the `deleted` attribute on them. Caution: Running a pipeline with `LIFECYCLE_STATE_RESET` ignores the restriction on deleting objects set by `DATAOPS_PREVENT_OBJECT_DELETION`.
`DATAOPS_SECRETS_DIR`	`/secrets`	Persistent storage directory an orchestrator used for the DataOps Vault mounted from the DataOps Runner host
`DATAOPS_VAULT_KEY`	`$CI_COMMIT_SHA`	Partial key to encrypt the content of the DataOps Value at runtime
`DATAOPS_VAULT_CONTENT`	`$CI_PROJECT_DIR/vault-content/vault.yml`	Content of the DataOps Value at design time
`DATAOPS_VAULT_SALT_FILE`	`$DATAOPS_SECRETS_DIR/vault.salt`	Persistent storage location of the salt file used by the DataOps Vault as a key together with `DATAOPS_VAULT_KEY` mounted from the DataOps Runner host
`DATAOPS_ENABLE_BEHAVIOR_CHANGE_BUNDLE`	unset	If set, you activate all the features under the flag, including behavioral changes that may break backward compatibility. Your set value must follow the format <year_month> corresponding to the release month. For example, `2023_08`. See Bundled Feature Flags for more information.

Pipeline variables

At pipeline execution time, further DATAOPS_ variables are derived and available in your jobs.

`DATAOPS_PREVENT_OBJECT_DELETION`

This variable is set by default. If you need to disable it, set the variable DATAOPS_PREVENT_OBJECT_DELETION to 0.

`DATAOPS_DATABASE`

The variable DATAOPS_DATABASE is available at pipeline execution time. The value is ${DATAOPS_PREFIX}_${DATAOPS_ENV_NAME}".

You can access it via template rendering in your configuration files as {{ env.DATAOPS_DATABASE }}.

`DATAOPS_DATABASE_MASTER`

The variable DATAOPS_DATABASE_MASTER is available at pipeline execution time. The value is the name of the production database and is computed as ${DATAOPS_PREFIX}_${DATAOPS_ENV_NAME_PROD}".

You can access it via template rendering in your configuration files as {{ env.DATAOPS_DATABASE_MASTER }}.

`DATAOPS_ENV_NAME`

DATAOPS_ENV_NAME is computed at runtime based on the environment the pipeline runs for. The value will be one of:

DATAOPS_ENV_NAME_PROD,
DATAOPS_ENV_NAME_DEV,
DATAOPS_ENV_NAME_QA, or
FB_${branch_clean}, e.g. DATAOPS_ENV_NAME=FB_COMBINED_WF for the branch combined-wf

You can access it via template rendering in your configuration files as {{ env.DATAOPS_ENV_NAME }}.

`DATAOPS_NONDB_ENV_NAME`

This can be used interchangeably with DATAOPS_ENV_NAME to namespace account level Snowflake objects (apart from the default DataOps database) into a pre-existing environment, avoiding a proliferation of roles, warehouses, etc. in feature branch environments. Instead, feature branches can reuse these resources from an existing environment.

If the configuration variable DATAOPS_FEATURE_BRANCH_NONDB_ENV_NAME is set to the name of an environment (typically DEV), the variable DATAOPS_NONDB_ENV_NAME will be set to that name for all feature branch environments (i.e. not PROD, QA or DEV).

Example Usage

If DATAOPS_FEATURE_BRANCH_NONDB_ENV_NAME is set to DEV (and branch/environment names use the DataOps default values):

Branch	`DATAOPS_ENV_NAME`	`DATAOPS_NONDB_ENV_NAME`
`main`	`PROD`	`PROD`
`qa`	`QA`	`QA`
`dev`	`DEV`	`DEV`
`feature1`	`FB_FEATURE1`	`DEV`

Pipeline stages

The default Pipeline Stages are included from the DataOps Reference Project stages.yml. The default stages represent the all the stages of execution DataOps.live has seen with customers.

The DataOps template project provides a simplified list of stages:

template project stages.yml
stages:
  - Pipeline Initialisation
  - Vault Initialisation
  - Snowflake Setup
  - Additional Configuration
  - Data Ingestion
  - Source Testing
  - Data Transformation
  - Transformation Testing
  - Generate Docs
  - Clean Up

There are still cases in your project where you want to define your own stage names as well as sequences of stages. Do so by modifying pipelines/includes/config/stages.yml. Using the template project stages as a starting point, you can add or remove some of the default stages and provide your own:

pipelines/includes/default/stages.yml
stages:
  - Pipeline Initialisation # reserved
  - Vault Initialisation # reserved
  - Snowflake Setup # reserved
  # - Additional Configuration
  - Data Ingestion
  - Source Testing
  - Data Transformation
  - Transformation Testing
  - Publish Data # added
  - Generate Docs # reserved
  - Clean Up # reserved

reserved stages

Stages marked as # reserved should not be removed from the definition of the stages as the data product platform depends on them.

The order of the stages is important as that defines the execution order of job execution. Each stage is run sequentially after the other.

project settings stages override the reference project stages

Because pipelines/includes/config/stages.yml is included after the default base_bootstrap.yml, the pipeline will use the stage configuration from the project's includes folder rather than the reference project stages.

DataOps runner selection with agent_tag​

Project variables​

Pipeline variables​

DATAOPS_PREVENT_OBJECT_DELETION​

DATAOPS_DATABASE​

DATAOPS_DATABASE_MASTER​

DATAOPS_ENV_NAME​

DATAOPS_NONDB_ENV_NAME​

Pipeline stages​