Skip to main content

How to Create a Custom Reference Project

Once your DataOps account has multiple projects, you might start thinking about avoiding content repetition. The solution is your custom reference project.

A custom reference project is the ideal place for:

  • your custom stage definitions
  • a default set of variables every project should use
  • pipeline jobs you want to reuse in every project
  • utility scripts in Python or Bash, that you want to always run

Creating the new reference project

First, identify or create a suitable project group to hold the shared content. A good approach is to create a sub-group at the top level group of your account called Reference and Templates. This will default to the sub-group URL slug reference-and-templates and you will end up with a URL like https://app.dataops.live/customer/reference-and-templates.

Reporter access

Access at the Reporter level will need to be granted to this group to all your users so pipelines can access the reference project.

Once you have a suitable group, create a blank project called Company Reference Project. The resultant URL will be https://app.dataops.live/customer/reference-and-templates/company-reference-project.

Ensure access to the new reference project

Any pipeline in any project that uses your new reference project needs to access the reference project. You accomplish this by using the built-in CI_JOB_TOKEN variable (see the configuration example in the Setting variables section below), which picks up its level of access from the user who started the pipeline. Therefore, all users who will run pipelines must have a minimum of Reporter level access on your reference project.

Please note

Any users accessing your top-level customer group in our data product platform (https://app.dataops.live/customer) will already have access to the reference project. However, if you manage users at a lower level, you must ensure that the reference project also has these users as members.

When you create the new reference project, the Allow access to this project with a CI_JOB_TOKEN setting is turned on. To make use of this feature, all projects that will use the new reference project must be added to the allow list configuration.

Alternatively, you can turn off this setting, letting all users with access to the reference project run pipelines that depend on it.

To turn the setting on or off, follow these steps:

  1. Navigate to your reference project and click Settings → CI/CD.
  2. Expand the Token Access section.
  3. Toggle Allow access to this project with a CI-JOB_TOKEN on or off.

Settings CI/CD Token Access !!shadow!!

Set up bootstrapping for the new reference project

Each DataOps project will always include the /pipelines/includes/bootstrap.yml file from the standard DataOps Reference Project. Doing so avoids having to include every single job in your pipeline definition. Thus, each pipeline typically starts with:

full-ci.yml
include:
- /pipelines/includes/bootstrap.yml

Since we want to create a new reference project, we also want to update our default bootstrap configuration.

The new custom reference project will contain a base_bootstrap.yml file that includes the content from the standard DataOps reference project to retain all the usual DataOps goodness.

In your new, blank reference project, create the file pipelines/includes/base_bootstrap.yml with the following content:

pipelines/includes/base_bootstrap.yml
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml

Setting variables in the new reference project

Create a file pipelines/includes/config/variables.yml in the reference project and set the variable as follows:

pipelines/includes/config/variables.yml
variables:
MY_VAR: my_value

Ensure that the variables.yml is included in your pipelines/includes/base_bootstrap.yml:

pipelines/includes/base_bootstrap.yml
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml

- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/variables.yml

DATAOPS_EXTRA_REFERENCE_PROJECTS

The first job in any pipeline is to initialize the pipeline environment by cloning the reference project into the pipeline's workspace. Therefore, as you create a new, custom reference project, you must clone it into each pipeline workspace. Fortunately, DataOps makes this easy to use, providing a variable, DATAOPS_EXTRA_REFERENCE_PROJECTS, to enable this clone operation.

Caution!

You must set the variable DATAOPS_EXTRA_REFERENCE_PROJECTS in the reference project, not the standard project. Additionally, you must ensure that it references the same branch/tag as the definition uses. For example, if the release tag is v1-ref, the value of DATAOPS_EXTRA_REFERENCE_PROJECTS must reference the tag v1-ref.

Correct tag usage ensures consistency between the pipeline configuration bootstrapped from bootstrap.yml and the reference project ref cloned. Otherwise, you may experience hard-to-find bugs when running the project pipelines.

In the file pipelines/includes/config/variables.yml in the reference project set the variable as follows:

pipelines/includes/config/variables.yml
variables:
## Also clone this customer reference project (v1-ref)
DATAOPS_EXTRA_REFERENCE_PROJECTS: https://gitlab-ci-token:${CI_JOB_TOKEN}@app.dataops.live/customer/reference-and-templates/company-reference-project.git|v1-ref

You will notice that this variable takes a value of the form URL|REF where URL is the complete reference project GIT URL (including the .git extension), and REF is the branch or tag to clone from. This branch/tag must match the one used for the final released version of the reference project (see below).

Please note

It's essential to use a branch, or ideally tag, as the source of your reference content rather than just using the main branch, as this gives consistency and control over updates and releases.

Decide on a release tag that you use for all reference project links. The tag v1-ref is used in this document.

DATAOPS_EXTRA_BEFORE_SCRIPTS

It is also possible to enhance and override some DataOps runtime variables using an additional custom before_script, which is activated using this variable.

pipelines/includes/config/variables.yml
variables:
## Also clone this customer reference project (v1-ref)
DATAOPS_EXTRA_REFERENCE_PROJECTS: https://gitlab-ci-token:${CI_JOB_TOKEN}@app.dataops.live/customer/reference-and-templates/company-reference-project.git|v1-ref
DATAOPS_EXTRA_BEFORE_SCRIPTS: ${DATAOPS_REFERENCE_PROJECT_DIR}/company-reference-project/scripts/before-script.sh

Other variables

You can set other DataOps configuration variables in your custom reference project's variables.yml file. These will override the standard DataOps defaults and provide customized default values for all your projects that use your new reference project.

Adding custom stages to the reference project

It's possible to create a custom stages.yml file in your new reference project, allowing the use of a specific set of stage definitions across all your projects. To do this, copy the file stages.yml from the DataOps Reference Project.

Place the copy at /pipelines/includes/config/stages.yml in your reference project. Then ensure the custom stage definition is included in your bootstrap file:

pipelines/includes/base_bootstrap.yml
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml

- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/variables.yml

- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/stages.yml

Adding custom jobs and base jobs to the reference project

You can add other jobs and base job definitions to the reference project. First, create a custom job definition that uses an image from the GitHub container registry:

/pipelines/includes/default/a-reusable-job.yml
A reusable job:
extends:
- .agent_tag
stage: "A custom stage"
image: ghcr.io/namespace/image-name:image-version

You can then decide if you want to always include the job in every pipeline. If you want to do so, add it to pipelines/includes/base_bootstrap.yml:

pipelines/includes/base_bootstrap.yml
include:
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml

- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/variables.yml

- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/stages.yml

- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/default/a-reusable-job.yml

Review and finalize the reference project bootstrap

As you have added new files to your new reference project, link them into the base_bootstrap.yml file so they are available in all your projects.

For each file you have added, include a section such as the following in base_bootstrap.yml:

pipelines/includes/base_bootstrap.yml
include:
...

- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/config/[my-file].yml # change this
Important

Don't delete the reference to the base_bootstrap.yml file from the DataOps Reference Project, or all essential things will stop working!

Once you have linked all your reference project's pipeline configuration files into base_bootstrap.yml, it will look something like this:

pipelines/includes/base_bootstrap.yml
include:

##### DataOps Core #####

- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/base_bootstrap.yml

##### custom company reference project jobs #####

## custom variable definitions and overrides
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref # Set this to your release tag
file: /pipelines/includes/config/variables.yml

## custom stages
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref # Set this to your release tag
file: /pipelines/includes/config/stages.yml

...

Releasing your reference project

Before using your new reference project, create your release tag for the current code. If you followed the example, use v1-ref as the tag and not the branch main.

Ensure consistent use of the release tag

The tag under which you release your reference project must match all the ref: values in the includes within base_bootstrap.yml.

In our example, it was always ref: v1-ref.

Now, release the reference project.

Testing and using the new reference project

You can test it using one of your existing projects or a new project from the standard template.

Switch to the new reference project

To switch a project from the standard reference project to your new custom reference project, follow these two steps:

  1. Update bootstrap.yml

    In the project's /pipelines/includes/bootstrap.yml, update the project and ref to match your new reference project's URL path and release tag:

    Before:

    pipelines/includes/bootstrap.yml
    include:
    - project: reference-template-projects/dataops-template/dataops-reference
    ref: 5-stable
    file: /pipelines/includes/base_bootstrap.yml

    ...

    After:

    pipelines/includes/bootstrap.yml
    include:
    - project: customer/reference-and-templates/company-reference-project
    ref: v1-ref
    file: /pipelines/includes/base_bootstrap.yml

    ...
  2. Remove any local content that's now in the reference project

    Since a common reason for creating a custom reference project is to move duplicated content out of DataOps projects, ensure you do so in each project. Otherwise, the local configuration will override that from your reference project, which may not be immediately apparent as it will probably be the same code right now.

  3. Testing it

    Set up and run a pipeline to verify that everything in your new configuration works correctly in the repointed project.

Using a job from the reference project

Your new pipeline definitions will now look like:

full-ci.yml
include:
## Your updated bootstrap file
### this also includes all the jobs you always wanted to include in each pipeline
- /pipelines/includes/bootstrap.yml

## The standard Snowflake Object Lifecycle Engine jobs (SOLE)
- project: reference-template-projects/dataops-template/dataops-reference
ref: 5-stable
file: /pipelines/includes/default/snowflake_lifecycle.yml

## a custom, reusable job from your reference project, not always included
- project: customer/reference-and-templates/company-reference-project
ref: v1-ref
file: /pipelines/includes/default/my-custom-job.yml

Using content from your reference project

Suppose you have included additional content in your reference project outside the pipeline configuration files, e.g., dbt libraries for use in MATE jobs. In that case, you may need to include that content before using it explicitly. The DataOps pipeline initialization will clone your reference project into each pipeline's workspace at the following location:

${DATAOPS_REFERENCE_PROJECT_DIR}/company-reference-project

This then allows you to write a job that depends on content from your company reference project. In this example a Python application in the scripts directory:

my/example-job.yml
Job using reference project resource:
extends:
- .agent_tag
stage: "A custom stage"
image: $DATAOPS_PYTHON3_RUNNER_IMAGE
variables:
DATAOPS_RUN_PYTHON_SCRIPT: ${DATAOPS_REFERENCE_PROJECT_DIR}/company-reference-project/scripts/my.py
script:
- /dataops