Skip to main content

DataOps Project Structure

The standard DataOps project has key files and folders that you will commonly use. DataOps.live recommends that you initialize your projects by Cloning the DataOps Template Project.

Doing so will provide you with the following layout:

/
├── dataops/
| ├── modelling/
| └── snowflake/
|
├── pipelines/
| └── includes/
| ├── config/
| | ├── agent_tag.yml
| | ├── stages.yml
| | └── variables.yml
| |
| ├── local_includes/
| | ├── default jobs.yml
| |
| |
| └── bootstrap.yml
|
├── vault-content/
|
└── full-ci.yml

dataops/

This directory contains the configurations for the DataOps Snowflake Object Lifecycle Engine (SOLE) and Modelling and Transformation (MATE) Engine that run within the project's pipelines.

dataops/modelling/

Every project has a Modelling and Transformation (MATE) Engine root directory, equivalent to a dbt project directory, that contains the project definition file and all sources, models, seeds, tests, and other configuration.

dataops/snowflake/

All projects also have a Snowflake Object Lifecycle Engine (SOLE) configuration directory. This holds all the YAML configuration files that determine how SOLE will build all the Snowflake objects in your project's Snowflake account.

pipelines/includes/

This is the location for all pipeline configuration files, that contain the core project configuration (variables, agent tag, etc.) and specify how each job is set up.

pipelines/includes/bootstrap.yml

The bootstrap.yml file pulls in all config and default files and gets a project ready to run.

pipelines/includes/config/

The YAML files in this directory control the shared main configuration of the project, adding to and overriding variables set in the reference project. Key files are:

  • agent_tag.yml - Specifies the name of the DataOps Runner that will execute this project's pipeline jobs.
  • stages.yml - Overrides the default stage names from the reference project.
  • variables.yml - Sets the main configuration variables.

pipelines/includes/local_includes/

This is where job definitions are usually stored, typically using one file for each job (or base job). Subdirectories can be used in more complex projects to organize jobs for easier navigation.

vault-content/

Usually contains a single file, vault.template.yml, that defines an additional set of vault configuration to supplement and/or re-map the sensitive values acquired from the DataOps Runner's vault files or the configured secrets manager.

full-ci.yml

Along with other -ci.yml files, full-ci.yml is the default pipeline definition. These files must be located at the root of the project file structure, but usually just contain references to other files, either located in pipelines/includes/local_includes or in the DataOps Reference Project.