How to Create a Custom Reference Project
Once your DataOps account has more than one project, you might start thinking about how to avoid the needless repetition of content. This is where a custom reference project comes in.
Creating the reference project
First, identify or create a suitable group to hold shared content. A good approach is to create a group at the top level of your account called something like Reference and Templates.
Access at the Reporter level will need to be granted on this group to all your users so they can access the reference project content in pipelines.
Once you have a suitable group, create an empty project called %CUSTOMER% Reference Project.
Setting up bootstrapping
All standard DataOps projects bootstrap from their reference project (by default, the DataOps Reference Project) including the reference project's base_bootstrap.yml file into their local bootstrap.yml file.
Our new custom reference project will contain a base_bootstrap.yml file that will include the content from the standard DataOps reference project (to retain all the usual DataOps goodness).
In your new, blank reference project, create the following file:
pipelines/includes/base_bootstrap.yml
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml
Setting variables
DATAOPS_EXTRA_REFERENCE_PROJECTS
The first job in any pipeline is to initialize the pipeline environment by cloning the reference project into the pipeline's workspace. Therefore, as we are creating a new, custom reference project, this will also need to be cloned into each pipeline workspace. Fortunately, DataOps makes this easy to use, providing a variable, DATAOPS_EXTRA_REFERENCE_PROJECTS, to enable this clone operation.
Create the following file and set this variable as follows:
pipelines/includes/config/variables.yml
## Also clone this CUSTOMER reference project (v1.0.0)
DATAOPS_EXTRA_REFERENCE_PROJECTS: https://gitlab-ci-token:${CI_JOB_TOKEN}@app.dataops.live/CUSTOMER/reference-and-templates/CUSTOMER-reference-project.git|v1.0.0
You will notice that this variable takes a value of the form URL|REF
where URL is the complete reference project GIT URL (including the .git extension), and REF is the branch or tag to clone from.
It's essential to use a branch, or ideally tag, as the source of your reference content rather than just using the main branch, as this gives consistency and control over updates and releases.
Decide on a release tag (we've used v1.0.0
in this document) that will be used for all reference project links.
DATAOPS_EXTRA_BEFORE_SCRIPTS
It is also possible to enhance and override some DataOps runtime variables using an additional custom before_script, which is activated using this variable.
As this is an advanced feature of the DataOps platform, don't hesitate to contact your account representative for further assistance.
Other variables
You can set other DataOps configuration variables in your custom reference project's variables.yml file. These will override the standard DataOps defaults and provide customized default values for all your projects that use your new reference project.
Configuring stages and jobs
It's possible to create a custom stages.yml file in your new reference project, allowing the use of a specific set of stage definitions across all your projects. To do this, copy the stages.yml file from the DataOps Reference Project into the identical file location in your reference project.
You can add other job and base job definition files to the reference project using the standard file locations.
Updating the bootstrap
As you have added new files to your new reference project, these will need to be linked into the base_bootstrap.yml file to be available in all your projects.
For each file you have added, include a section such as the following in base_bootstrap.yml:
pipelines/includes/base_bootstrap.yml
include:
...
## CUSTOMER variable definitions and overrides
- project: CUSTOMER/reference-and-templates/CUSTOMER-reference-project
ref: v1.0.0 # Set this to your release tag
file: /pipelines/includes/config/variables.yml
Don't delete the reference to the base_bootstrap.yml file from the DataOps Reference Project, or all essential things will stop working!
Once you have linked all your reference project's pipeline configuration files into base_bootstrap.yml, it will look something like this:
pipelines/includes/base_bootstrap.yml
include:
##### DataOps Core #####
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml
##### CUSTOMER Custom #####
## CUSTOMER variable definitions and overrides
- project: CUSTOMER/reference-and-templates/CUSTOMER-reference-project
ref: v1.0.0 # Set this to your release tag
file: /pipelines/includes/config/variables.yml
## CUSTOMER stages
- project: CUSTOMER/reference-and-templates/CUSTOMER-reference-project
ref: v1.0.0 # Set this to your release tag
file: /pipelines/includes/config/stages.yml
...
Releasing your reference project
Before using your new reference project, you must create your release tag (see above) from the current code. For now, this will usually involve tagging the branch you've been developing. Still, moving forward, it will be better to have a dev/test/MR workflow around making and releasing reference project changes.
The tag you release your reference project under must match all the ref values in the includes within base_bootstrap.yml.
Updating your projects and templates
Now the reference project has been released. You can test it using one of your existing projects or a new project from the standard template.
To switch a project from the DataOps standard reference project to your new custom reference project, follow these two steps:
1. Update bootstrap.yml
In the project's bootstrap.yml, update the project
and ref
to match your new reference project's URL path and release tag:
pipelines/includes/bootstrap.yml BEFORE
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml
...
pipelines/includes/bootstrap.yml AFTER
include:
- project: CUSTOMER/reference-and-templates/CUSTOMER-reference-project
ref: v1.0.0
file: /pipelines/includes/base_bootstrap.yml
...
2. Remove any local content that's now in the reference project
Since a common reason for creating a custom reference project is to move duplicated content out of DataOps projects, you must ensure this is done in each project. Otherwise, the local configuration will override that from your reference project, which may not be immediately apparent as it will probably be the same code right now.
Testing the new configuration
Set up and run a pipeline to verify everything from your new configuration is working correctly in the repointed project.
Suppose you have included additional content in your reference project outside the pipeline configuration files, e.g., dbt libraries for use in MATE jobs. In that case, this may need to be explicitly included before it can be used. The DataOps pipeline initialization will clone your reference project into each pipeline's workspace at the following location:
$CI_PROJECT_DIR/reference-projects/CUSTOMER-reference-project