DataOps Reference Project
The DataOps reference project, located in our data product platform contains a set of standard definitions, jobs, and other project content automatically included in every DataOps project.
Why have a reference project?
Most files needed for a DataOps project are standard configuration and job definition files that rarely change. Therefore, it does not make sense to recreate these files every time you create a new DataOps project. An early solution was to collate these files into a standard project template and duplicate them into each new project.
However, it quickly became apparent that the ongoing cost of maintaining and updating these files would become a significant limiting factor to development within the data product platform. Therefore, we created the DataOps Reference Project with all of these files. This reference project is then inherited in each new project.
You can also create custom reference projects to avoid content repetition if you have multiple Dataops projects. For more information, see How to Create a Custom Reference Project.
How is the reference project implemented?
The DataOps Reference Project has an access level of internal so that only authenticated users have read-only access to it.
The following standard content is maintained in the reference project, including:
- Default values for the main DataOps variables
- Additional variables that provide icons and orchestrator image names
- A before_script that runs at the start of every job to set the runtime variables
- Standard job configurations for pipeline initialization, secrets loading, Snowflake setup, and so on
- Base jobs to allow simpler project job definitions
- Execution rules, allowing branch-based job control, implemented as base jobs
Each new project created from the DataOps template project includes a bootstrap.yml
file in all pipeline files that loads all the reference project files.
What are the reference project benefits?
You can manage most of the standard pipeline configuration and content from within the reference project. This allows us to push updates to a single location, support new features or fix issues and push changes seamlessly to all your pipelines when they run.
How to override the standard configurations in the reference project?
You must modify your project settings for simple changes such as adding or removing stages.
However, for more complex changes like tweaking any Snowflake Object Lifecycle Engine (SOLE) jobs, the best way is to copy the relevant file from the reference project implementation to your project and make changes in this file. The workflow for this is as follows:
- Copy the file in question from the reference project to the exact location in your project as per the project structure
- Make the changes you need to this file
- Edit your project's
bootstrap.yml
to alter the changed file's link from the reference project to its new local location
Although making such changes is often merited, remember that it becomes something you must support and maintain yourself. You won't directly receive any standard updates to the file from the reference project unless you merge them manually.