Skip to main content

How to Create a Custom Reference Project

Once your DataOps account has multiple projects, you might start thinking about avoiding needless content repetition. The solution is your custom reference project.

Creating the reference project

First, identify or create a suitable project group to hold the shared content. A good approach is to create a group at the top level of your account called Reference and Templates.

Reporter access

Access at the Reporter level will need to be granted to this group to all your users so pipelines can access the reference project.

Once you have a suitable group, create an empty project called %CUSTOMER% Reference Project.

Reference project access

Any pipeline in any project that uses your new reference project needs to access the reference project. You accomplish this by using the built-in CI_JOB_TOKEN variable (see the configuration example in the Setting variables section below), which picks up its level of access from the user who started the pipeline. Therefore, all users who will run pipelines must have a minimum of Reporter level access on your reference project.

Please note

Any users accessing your top-level customer group in our data product platform (e.g., https://app.dataops.live/CUSTOMER) will already have access to the reference project. However, if you manage users at a lower level, you must ensure that the reference project also has these users as members.

When you create the new reference project, the Allow access to this project with a CI_JOB_TOKEN setting is turned on. To make use of this feature, all projects that will use the new reference project must be added to the allow list configuration.

Alternatively, you can turn off this setting, letting all users with access to the reference project run pipelines that depend on it.

To turn the setting on or off, follow these steps:

  1. Navigate to your reference project and click Settings → CI/CD.
  2. Expand the Token Access section.
  3. Toggle Allow access to this project with a CI-JOB_TOKEN on or off.

Settings CI/CD Token Access !!shadow!!

Setting up bootstrapping

All standard DataOps projects bootstrap from their reference project (by default, the DataOps Reference Project) including the reference project's base_bootstrap.yml file into their local bootstrap.yml file.

The new custom reference project will contain a base_bootstrap.yml file that includes the content from the standard DataOps reference project (to retain all the usual DataOps goodness).

In your new, blank reference project, create the following file:

pipelines/includes/base_bootstrap.yml
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml

Setting variables

DATAOPS_EXTRA_REFERENCE_PROJECTS

The first job in any pipeline is to initialize the pipeline environment by cloning the reference project into the pipeline's workspace. Therefore, as you create a new, custom reference project, you must clone it into each pipeline workspace. Fortunately, DataOps makes this easy to use, providing a variable, DATAOPS_EXTRA_REFERENCE_PROJECTS, to enable this clone operation.

Caution!

You must set the variable DATAOPS_EXTRA_REFERENCE_PROJECTS in the reference project, not the standard project. Additionally, you must ensure that it references the same branch/tag as the definition uses. For example, if the release tag is 5-stable, the value of DATAOPS_EXTRA_REFERENCE_PROJECTS must reference the tag 5-stable.

Correct tag usage ensures consistency between the pipeline configuration bootstrapped from bootstrap.yml and the reference project ref cloned. Otherwise, you may experience hard-to-find bugs when running the project pipelines.

Create the following file and set this variable as follows:

pipelines/includes/config/variables.yml
## Also clone this CUSTOMER reference project (v1.0.0)
DATAOPS_EXTRA_REFERENCE_PROJECTS: https://gitlab-ci-token:${CI_JOB_TOKEN}@app.dataops.live/CUSTOMER/reference-and-templates/CUSTOMER-reference-project.git|v1.0.0

You will notice that this variable takes a value of the form URL|REF where URL is the complete reference project GIT URL (including the .git extension), and REF is the branch or tag to clone from. This branch/tag must match the one used for the final released version of the reference project (see below).

Please note

It's essential to use a branch, or ideally tag, as the source of your reference content rather than just using the main branch, as this gives consistency and control over updates and releases.

Decide on a release tag (v1.0.0 is used in this document) that you use for all reference project links.

DATAOPS_EXTRA_BEFORE_SCRIPTS

It is also possible to enhance and override some DataOps runtime variables using an additional custom before_script, which is activated using this variable.

Other variables

You can set other DataOps configuration variables in your custom reference project's variables.yml file. These will override the standard DataOps defaults and provide customized default values for all your projects that use your new reference project.

Configuring stages and jobs

It's possible to create a custom stages.yml file in your new reference project, allowing the use of a specific set of stage definitions across all your projects. To do this, copy the stages.yml file from the DataOps Reference Project into the identical file location in your reference project.

You can add other jobs and base job definitions to the reference project. Use the standard file locations.

Updating the bootstrap

As you have added new files to your new reference project, link them into the base_bootstrap.yml file so they are available in all your projects.

For each file you have added, include a section such as the following in base_bootstrap.yml:

pipelines/includes/base_bootstrap.yml
include:
...

## CUSTOMER variable definitions and overrides
- project: CUSTOMER/reference-and-templates/CUSTOMER-reference-project
ref: v1.0.0 # Set this to your release tag
file: /pipelines/includes/config/variables.yml
Important

Don't delete the reference to the base_bootstrap.yml file from the DataOps Reference Project, or all essential things will stop working!

Once you have linked all your reference project's pipeline configuration files into base_bootstrap.yml, it will look something like this:

pipelines/includes/base_bootstrap.yml
include:

##### DataOps Core #####

- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml

##### CUSTOMER Custom #####

## CUSTOMER variable definitions and overrides
- project: CUSTOMER/reference-and-templates/CUSTOMER-reference-project
ref: v1.0.0 # Set this to your release tag
file: /pipelines/includes/config/variables.yml

## CUSTOMER stages
- project: CUSTOMER/reference-and-templates/CUSTOMER-reference-project
ref: v1.0.0 # Set this to your release tag
file: /pipelines/includes/config/stages.yml

...

Releasing your reference project

Before using your new reference project, you must create your release tag (see above) from the current code. For now, this will usually involve tagging the branch you've been developing. Still, moving forward, having a dev/test/merge request workflow will be better when making and releasing reference project changes.

Take care

The tag under which you release your reference project must match all the ref: values in the includes within base_bootstrap.yml.

Updating your projects and templates

Now, release the reference project. You can test it using one of your existing projects or a new project from the standard template.

To switch a project from the standard reference project to your new custom reference project, follow these two steps:

1. Update bootstrap.yml

In the project's bootstrap.yml, update the project and ref to match your new reference project's URL path and release tag:

Before:

pipelines/includes/bootstrap.yml
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml

...

After:

pipelines/includes/bootstrap.yml
include:
- project: CUSTOMER/reference-and-templates/CUSTOMER-reference-project
ref: v1.0.0
file: /pipelines/includes/base_bootstrap.yml

...

2. Remove any local content that's now in the reference project

Since a common reason for creating a custom reference project is to move duplicated content out of DataOps projects, ensure you do so in each project. Otherwise, the local configuration will override that from your reference project, which may not be immediately apparent as it will probably be the same code right now.

Testing the new configuration

Set up and run a pipeline to verify everything from your new configuration works correctly in the repointed project.

Suppose you have included additional content in your reference project outside the pipeline configuration files, e.g., dbt libraries for use in MATE jobs. In that case, you may need to include that content before using it explicitly. The DataOps pipeline initialization will clone your reference project into each pipeline's workspace at the following location:

$CI_PROJECT_DIR/reference-projects/CUSTOMER-reference-project