Global Project Settings
The global settings for your project are in three essential files:
/
├── pipelines/
| └── includes/
| ├── config/
| | ├── agent_tag.yml
| | ├── stages.yml
| | └── variables.yml
agent_tag.yml
drives the DataOps runner selectionstages.yml
defines the well know pipeline stagesvariables.yml
defines all global project variables
DataOps runner selection with agent_tag
The agent_tag.yml
file defines which DataOps runner to use for your pipeline. Choosing and configuring your runner tag is described in the DataOps Docker Runner Installation or the DataOps Kubernetes Runner Installation, respectively.
Project variables
The project settings in pipelines/includes/config/variables.yml
contain the key DataOps variables driving consistent behavior across all your pipelines.
DataOps uses a set of variables prefixed by DATAOPS_
across the entire data product platform.
The variables you can set in your project and commit to variables.yml
are listed in the table below.
You can also add transformation parameters to the project variables.yml
file to apply MATE operations at the project level, i.e., to all jobs in the pipeline.
DataOps Variables | Default Value | Behavior/Description |
---|---|---|
DATAOPS_PREFIX | DATAOPS | Defines the prefix used for all Snowflake objects |
DATAOPS_DEBUG | unset | If set, provides extensive job logging with confidential secret values being masked |
DATAOPS_SOLE_DEBUG | unset | If set, provides extensive SOLE logging masking all credentials |
DATAOPS_SOLE_WAREHOUSE | unset | Snowflake warehouse to use for SOLE queries when the SOLE user has no default warehouse |
DATAOPS_BRANCH_NAME_PROD | main | Name of the branch representing the production environment |
DATAOPS_BRANCH_NAME_DEV | dev | Name of the branch representing the development environment |
DATAOPS_BRANCH_NAME_QA | qa | Name of the branch representing the test environment |
DATAOPS_ENV_NAME_PROD | PROD | Name of the production environment |
DATAOPS_ENV_NAME_DEV | DEV | Name of the development environment |
DATAOPS_ENV_NAME_QA | QA | Name of the test environment |
DATAOPS_FEATURE_BRANCH_NONDB_ENV_NAME | unset | Use this environment name for all feature branches (except for the default DataOps database) |
DATAOPS_EXTRA_BEFORE_SCRIPTS | unset | List of custom before scripts exposing additional project variables at runtime |
DATAOPS_EXTRA_REFERENCE_PROJECTS | unset | List of reference projects used in addition to the DataOps Reference Project |
DATAOPS_PREVENT_OBJECT_DELETION | set | By default, objects can only be deleted by setting the deleted attribute on them.Caution: Running a pipeline with LIFECYCLE_STATE_RESET ignores the restriction on deleting objects set by DATAOPS_PREVENT_OBJECT_DELETION . |
DATAOPS_SECRETS_DIR | /secrets | Persistent storage directory an orchestrator used for the DataOps Vault mounted from the DataOps Runner host |
DATAOPS_VAULT_KEY | $CI_COMMIT_SHA | Partial key to encrypt the content of the DataOps Value at runtime |
DATAOPS_VAULT_CONTENT | $CI_PROJECT_DIR/vault-content/vault.yml | Content of the DataOps Value at design time |
DATAOPS_VAULT_SALT_FILE | $DATAOPS_SECRETS_DIR/vault.salt | Persistent storage location of the salt file used by the DataOps Vault as a key together with DATAOPS_VAULT_KEY mounted from the DataOps Runner host |
DATAOPS_ENABLE_BEHAVIOR_CHANGE_BUNDLE | unset | If set, you activate all the features under the flag, including behavioral changes that may break backward compatibility. Your set value must follow the format <year_month> corresponding to the release month. For example, 2023_08 . See Bundled Feature Flags for more information. |
Pipeline variables
At pipeline execution time, further DATAOPS_
variables are derived and available in your jobs.
DATAOPS_PREVENT_OBJECT_DELETION
This variable is set by default. If you need to disable it, set the variable DATAOPS_PREVENT_OBJECT_DELETION
to 0
.
DATAOPS_DATABASE
The variable DATAOPS_DATABASE
is available at pipeline execution time. The value is ${DATAOPS_PREFIX}_${DATAOPS_ENV_NAME}"
.
You can access it via template rendering in your configuration files as {{ env.DATAOPS_DATABASE }}
.
DATAOPS_DATABASE_MASTER
The variable DATAOPS_DATABASE_MASTER
is available at pipeline execution time. The value is the name of the production database and is computed as ${DATAOPS_PREFIX}_${DATAOPS_ENV_NAME_PROD}"
.
You can access it via template rendering in your configuration files as {{ env.DATAOPS_DATABASE_MASTER }}
.
DATAOPS_ENV_NAME
DATAOPS_ENV_NAME
is computed at runtime based on the environment the pipeline runs for. The value will be one of:
DATAOPS_ENV_NAME_PROD
,DATAOPS_ENV_NAME_DEV
,DATAOPS_ENV_NAME_QA
, orFB_${branch_clean}
, e.g.DATAOPS_ENV_NAME=FB_COMBINED_WF
for the branchcombined-wf
You can access it via template rendering in your configuration files as {{ env.DATAOPS_ENV_NAME }}
.
DATAOPS_NONDB_ENV_NAME
This can be used interchangeably with DATAOPS_ENV_NAME
to namespace account level Snowflake objects (apart from the default DataOps database) into a pre-existing environment, avoiding a proliferation of roles, warehouses, etc. in feature branch environments. Instead, feature branches can reuse these resources from an existing environment.
If the configuration variable DATAOPS_FEATURE_BRANCH_NONDB_ENV_NAME
is set to the name of an environment (typically DEV
), the variable DATAOPS_NONDB_ENV_NAME
will be set to that name for all feature branch environments (i.e. not PROD
, QA
or DEV
).
Example Usage
If DATAOPS_FEATURE_BRANCH_NONDB_ENV_NAME
is set to DEV
(and branch/environment names use the DataOps default values):
Branch | DATAOPS_ENV_NAME | DATAOPS_NONDB_ENV_NAME |
---|---|---|
main | PROD | PROD |
qa | QA | QA |
dev | DEV | DEV |
feature1 | FB_FEATURE1 | DEV |
Pipeline stages
The default Pipeline Stages are included from the DataOps Reference Project stages.yml
. The default stages represent the all the stages of execution DataOps.live has seen with customers.
The DataOps template project provides a simplified list of stages:
stages:
- Pipeline Initialisation
- Vault Initialisation
- Snowflake Setup
- Additional Configuration
- Data Ingestion
- Source Testing
- Data Transformation
- Transformation Testing
- Generate Docs
- Clean Up
There are still cases in your project where you want to define your own stage names as well as sequences of stages. Do so by modifying pipelines/includes/config/stages.yml
. Using the template project stages as a starting point, you can add or remove some of the default stages and provide your own:
stages:
- Pipeline Initialisation # reserved
- Vault Initialisation # reserved
- Snowflake Setup # reserved
# - Additional Configuration
- Data Ingestion
- Source Testing
- Data Transformation
- Transformation Testing
- Publish Data # added
- Generate Docs # reserved
- Clean Up # reserved
Stages marked as # reserved
should not be removed from the definition of the stages as the data product platform depends on them.
The order of the stages is important as that defines the execution order of job execution. Each stage is run sequentially after the other.
Because pipelines/includes/config/stages.yml
is included after the default base_bootstrap.yml
, the pipeline will use the stage configuration from the project's includes folder rather than the reference project stages.