How to Create a Custom Reference Project
Once your DataOps account has multiple projects, you might start thinking about avoiding content repetition. The solution is your custom reference project.
A custom reference project is the ideal place for:
- your custom stage definitions
- a default set of variables every project should use
- pipeline jobs you want to reuse in every project
- utility scripts in Python or Bash, that you want to always run
Creating the new reference project
First, identify or create a suitable project group to hold the shared content. A good approach is to create a sub-group at the top level group of your account called Reference and Templates
. This will default to the sub-group URL slug reference-and-templates
and you will end up with a URL like https://app.dataops.live/customer/reference-and-templates
.
Access at the Reporter level will need to be granted to this group to all your users so pipelines can access the reference project.
Once you have a suitable group, create a blank project called Company Reference Project
. The resultant URL will be https://app.dataops.live/customer/reference-and-templates/company-reference-project
.
Ensure access to the new reference project
Any pipeline in any project that uses your new reference project needs to access the reference project. You accomplish this by using the built-in CI_JOB_TOKEN
variable (see the configuration example in the Setting variables section below), which picks up its level of access from the user who started the pipeline. Therefore, all users who will run pipelines must have a minimum of Reporter level access on your reference project.
Any users accessing your top-level customer group in our data product platform (https://app.dataops.live/customer) will already have access to the reference project. However, if you manage users at a lower level, you must ensure that the reference project also has these users as members.
When you create the new reference project, the Allow access to this project with a CI_JOB_TOKEN setting is turned on. To make use of this feature, all projects that will use the new reference project must be added to the allow list configuration.
Alternatively, you can turn off this setting, letting all users with access to the reference project run pipelines that depend on it.
To turn the setting on or off, follow these steps:
- Navigate to your reference project and click Settings → CI/CD.
- Expand the Token Access section.
- Toggle Allow access to this project with a CI-JOB_TOKEN on or off.
Set up bootstrapping for the new reference project
Each DataOps project will always include the /pipelines/includes/bootstrap.yml
file from the standard DataOps Reference Project. Doing so avoids having to include every single job in your pipeline definition. Thus, each pipeline typically starts with:
include:
- /pipelines/includes/bootstrap.yml
Since we want to create a new reference project, we also want to update our default bootstrap configuration.
The new custom reference project will contain a base_bootstrap.yml
file that includes the content from the standard DataOps reference project to retain all the usual DataOps goodness.
In your new, blank reference project, create the file pipelines/includes/base_bootstrap.yml
with the following content:
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml
Setting variables in the new reference project
Create a file pipelines/includes/config/variables.yml
in the reference project and set the variable as follows:
variables:
MY_VAR: my_value
Ensure that the variables.yml
is included in your pipelines/includes/base_bootstrap.yml
:
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/variables.yml
DATAOPS_EXTRA_REFERENCE_PROJECTS
The first job in any pipeline is to initialize the pipeline environment by cloning the reference project into the pipeline's workspace.
Therefore, as you create a new, custom reference project, you must clone it into each pipeline workspace. Fortunately, DataOps makes this easy to use, providing a variable, DATAOPS_EXTRA_REFERENCE_PROJECTS
, to enable this clone operation.
You must set the variable DATAOPS_EXTRA_REFERENCE_PROJECTS
in the reference project, not the standard project.
Additionally, you must ensure that it references the same branch/tag as the definition uses. For example, if the release tag is v1-ref
, the value of DATAOPS_EXTRA_REFERENCE_PROJECTS
must reference the tag v1-ref
.
Correct tag usage ensures consistency between the pipeline configuration bootstrapped from bootstrap.yml
and the reference project ref
cloned. Otherwise, you may experience hard-to-find bugs when running the project pipelines.
In the file pipelines/includes/config/variables.yml
in the reference project set the variable as follows:
variables:
## Also clone this customer reference project (v1-ref)
DATAOPS_EXTRA_REFERENCE_PROJECTS: https://gitlab-ci-token:${CI_JOB_TOKEN}@app.dataops.live/customer/reference-and-templates/company-reference-project.git|v1-ref
You will notice that this variable takes a value of the form URL|REF
where URL is the complete reference project GIT URL (including the .git extension), and REF is the branch or tag to clone from. This branch/tag must match the one used for the final released version of the reference project (see below).
It's essential to use a branch, or ideally tag, as the source of your reference content rather than just using the main branch, as this gives consistency and control over updates and releases.
Decide on a release tag that you use for all reference project links. The tag v1-ref
is used in this document.
DATAOPS_EXTRA_BEFORE_SCRIPTS
It is also possible to enhance and override some DataOps runtime variables using an additional custom before_script, which is activated using this variable.
variables:
## Also clone this customer reference project (v1-ref)
DATAOPS_EXTRA_REFERENCE_PROJECTS: https://gitlab-ci-token:${CI_JOB_TOKEN}@app.dataops.live/customer/reference-and-templates/company-reference-project.git|v1-ref
DATAOPS_EXTRA_BEFORE_SCRIPTS: ${DATAOPS_REFERENCE_PROJECT_DIR}/company-reference-project/scripts/before-script.sh
Other variables
You can set other DataOps configuration variables in your custom reference project's variables.yml
file. These will override the standard DataOps defaults and provide customized default values for all your projects that use your new reference project.
Adding custom stages to the reference project
It's possible to create a custom stages.yml
file in your new reference project, allowing the use of a specific set of stage definitions across all your projects. To do this, copy the file stages.yml
from the DataOps Reference Project.
Place the copy at /pipelines/includes/config/stages.yml
in your reference project. Then ensure the custom stage definition is included in your bootstrap file:
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/variables.yml
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/stages.yml
Adding custom jobs and base jobs to the reference project
You can add other jobs and base job definitions to the reference project. First, create a custom job definition that uses an image from the GitHub container registry:
A reusable job:
extends:
- .agent_tag
stage: "A custom stage"
image: ghcr.io/namespace/image-name:image-version
You can then decide if you want to always include the job in every pipeline. If you want to do so, add it to pipelines/includes/base_bootstrap.yml
:
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/variables.yml
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/stages.yml
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/default/a-reusable-job.yml
Review and finalize the reference project bootstrap
As you have added new files to your new reference project, link them into the base_bootstrap.yml
file so they are available in all your projects.
For each file you have added, include a section such as the following in base_bootstrap.yml
:
include:
...
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/[my-file].yml # change this
Don't delete the reference to the base_bootstrap.yml
file from the DataOps Reference Project, or all essential things will stop working!
Once you have linked all your reference project's pipeline configuration files into base_bootstrap.yml
, it will look something like this:
include:
##### DataOps Core #####
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml
##### custom company reference project jobs #####
## custom variable definitions and overrides
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref # Set this to your release tag
file: /pipelines/includes/config/variables.yml
## custom stages
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref # Set this to your release tag
file: /pipelines/includes/config/stages.yml
...
Releasing your reference project
Before using your new reference project, create your release tag for the current code. If you followed the example, use v1-ref
as the tag and not the branch main
.
The tag under which you release your reference project must match all the ref:
values in the includes
within base_bootstrap.yml
.
In our example, it was always ref: v1-ref
.
Now, release the reference project.
Testing and using the new reference project
You can test it using one of your existing projects or a new project from the standard template.
Switch to the new reference project
To switch a project from the standard reference project to your new custom reference project, follow these two steps:
-
Update
bootstrap.yml
In the project's
/pipelines/includes/bootstrap.yml
, update theproject
andref
to match your new reference project's URL path and release tag:Before:
pipelines/includes/bootstrap.ymlinclude:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml
...After:
pipelines/includes/bootstrap.ymlinclude:
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/base_bootstrap.yml
... -
Remove any local content that's now in the reference project
Since a common reason for creating a custom reference project is to move duplicated content out of DataOps projects, ensure you do so in each project. Otherwise, the local configuration will override that from your reference project, which may not be immediately apparent as it will probably be the same code right now.
-
Testing it
Set up and run a pipeline to verify that everything in your new configuration works correctly in the repointed project.
Using a job from the reference project
Your new pipeline definitions will now look like:
include:
## Your updated bootstrap file
### this also includes all the jobs you always wanted to include in each pipeline
- /pipelines/includes/bootstrap.yml
## The standard Snowflake Object Lifecycle Engine jobs (SOLE)
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/default/snowflake_lifecycle.yml
## a custom, reusable job from your reference project, not always included
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/default/my-custom-job.yml
Using content from your reference project
Suppose you have included additional content in your reference project outside the pipeline configuration files, e.g., dbt libraries for use in MATE jobs. In that case, you may need to include that content before using it explicitly. The DataOps pipeline initialization will clone your reference project into each pipeline's workspace at the following location:
${DATAOPS_REFERENCE_PROJECT_DIR}/company-reference-project
This then allows you to write a job that depends on content from your company reference project. In this example a Python application in the scripts
directory:
Job using reference project resource:
extends:
- .agent_tag
stage: "A custom stage"
image: $DATAOPS_PYTHON3_RUNNER_IMAGE
variables:
DATAOPS_RUN_PYTHON_SCRIPT: ${DATAOPS_REFERENCE_PROJECT_DIR}/company-reference-project/scripts/my.py
script:
- /dataops