Skip to main content

DataOps Project Structure

The standard DataOps project has key files and folders that you will commonly use. DataOps.live recommends that you initialize your projects by Cloning the DataOps Template Project.

Doing so will provide you with the following layout:

/
├── dataops/
| ├── modelling/
| ├── snowflake/
| └── profiles/
|
├── pipelines/
| └── includes/
| ├── config/
| | ├── agent_tag.yml
| | ├── stages.yml
| | └── variables.yml
| |
| ├── local_includes/
| | ├── modelling_and_transformation/*.yml
| | └── *_job.yml
| |
| └── bootstrap.yml
|
├── vault-content/
|
└── full-ci.yml

dataops/

This directory contains the configurations for the DataOps Snowflake Object Lifecycle Engine (SOLE) and Modelling and Transformation (MATE) Engine that run within the project's pipelines.

dataops/modelling/

Every project has a Modelling and Transformation (MATE) Engine root directory, equivalent to a dbt project directory, that contains the project definition file and all sources, models, seeds, tests, and other configurations.

dataops/snowflake/

All projects also have a Snowflake Object Lifecycle Engine (SOLE) configuration directory. This holds all the YAML configuration files that determine how SOLE will build all the Snowflake objects in your project's Snowflake account.

dataops/profiles/

This optional directory is used to store a custom profiles.template.yml file. The profiles file allows you to customize the dbt profiles used by MATE. If not present a default profile is used.

pipelines/includes/

This is the location for all pipeline configuration files, that contain the core project configuration (variables, agent tag, etc.) and specify how each job is set up.

pipelines/includes/bootstrap.yml

The bootstrap.yml file pulls in all config and default files and gets a project ready to run.

pipelines/includes/config/

The YAML files in this directory control the shared main configuration of the project, adding to and overriding variables set in the reference project. Key files are:

  • agent_tag.yml - Specifies the name of the DataOps Runner that will execute this project's pipeline jobs.
  • stages.yml - Overrides the default stage names from the reference project.
  • variables.yml - Sets the main configuration variables.

pipelines/includes/local_includes/

This is where job definitions are usually stored, typically using one file for each job or base job following the naming convention *_job.yml. Subdirectories can be used in more complex projects to organize jobs for easier navigation. One example of that is the folder modelling_and_transformation/ containing all MATE test, transformation, and documentation generation jobs.

vault-content/

Usually contains a single file, vault.template.yml, that defines an additional set of vault configurations to supplement and/or re-map the sensitive values acquired from the DataOps Runner's vault files or the configured secrets manager. Template rendering is used to expand values from the vault while doing so.

full-ci.yml

Along with other -ci.yml files, full-ci.yml is the default pipeline definition. All -ci.yml files must be located at the root of the project file structure. The pipeline usually just includes job definitions from pipelines/includes/local_includes/ or from the DataOps Reference Project.