Running Pipelines
For an in-depth explanation of the DataOps pipeline types and structure, see the Pipeline Overview documentation.
Multiple pipeline configurations
Your project can have multiple pipeline configuration files that define the jobs needed to fulfill the needs of your data pipeline.
You must add the pipeline configuration files to the root of the project. The
data product platform will consider any file ending with -ci.yml
as a pipeline
configuration file, e.g., full-ci.yml
, my-data-pipe-ci.yml
.
Methods to run pipelines
You can run pipelines using a variety of methods. Each method can define which pipeline configuration file to use.
The following is a list of the methods you can use to start a pipeline:
- Run Pipeline form (via the data product platform)
- Schedule form (via the data product platform)
- Commit message (via push, merge request)
- API (via HTTPS POST request)
Run pipeline form
Use this method when you want to run an ad-hoc pipeline via the web user interface and when the pipelines don't have a regular processing time or testing pipeline changes.
Steps to use this method:
-
Navigate to your project.
-
Navigate to CI/CD → Pipelines.
-
Click Run pipeline. You will be redirected to the Run pipeline form view.
-
Fill in the Run pipeline form.
- Select the branch or tag where your pipeline configuration file exists.
- Select the pipeline type as your chosen pipeline configuration file.
-
Click Run pipeline to start the pipeline. You will be redirected to the pipeline in progress view, where your pipeline jobs are created and ready to start.
New schedule form
Use this method when you want to run a pipeline configuration file at regular intervals which make running pipelines more consistent.
Steps to use this method:
-
Navigate to your project.
-
Navigate to CI/CD → Schedules.
-
Click New schedule. You will be redirected to the Schedule a new pipeline form view.
-
Fill in the Schedule a new pipeline form.
- Enter a description that appears on the pipeline schedules view.
- Select an interval pattern or create your own.
- Select the cron timezone.
- Select the branch or tag where your pipeline configuration file exists.
- Select the pipeline type as your chosen pipeline configuration file.
-
Click Save pipeline schedule. You will be redirected to the Schedules view where you can see your new schedule.
Commit message
Use this method when you author a change to your data pipeline that you would like to run as soon as you commit and push the change. This makes the development loop quicker by running a specific pipeline file per development commit.
The data product platform looks for your pipeline configuration filename anywhere in the commit message in the following form:
[<file-name> ci]
The following are examples of commit messages where the pipeline file is called
new-ci.yml
:
Some commit message [new-ci.yml ci]
[new-ci.yml ci] another message
Longer commit message [new-ci.yml ci] and more
When you push any of these messages, it will run the new-ci.ym
pipeline
configuration file.
A message that includes [skip ci]
guarantees none of the pipelines will run.
Steps to use this method from the Web IDE:
-
Navigate to your project.
-
Click Web IDE. An instance of VS Code is open showing your repo files.
-
Make a change to a file in the repo. The Source Control icon shows pending changes.
-
Click this icon and select to commit to main or to a new branch.
-
Select the pipeline type as your chosen pipeline configuration file. A message prompts you to validate the branch you're committing to followed by a confirmation message that summarizes all the details of your commit.
-
Click Yes. A success message displays on the bottom right of VS Code.
-
Click Go to Project to switch to your project details.
-
On the bottom status bar, click the pipeline ID to see a live progress feed of the running pipeline.
Steps to use this method on the command line. Note that these steps assume you have cloned your project locally and made a change:
-
git add --all
-
git commit -m "Some commit message [new-ci.yml]"
-
git push
To see the running pipeline:
- Navigate to your project.
- Navigate to CI/CD → Pipelines, to see the
new-ci.yml
pipeline running.
API (REST)
Use this method to integrate DataOps pipelines with scripts and software to use an external scheduling system or event-driven data pipeline architecture.
DataOps.live has a POST REST
endpoint
/api/v4/projects/YOUR_PROJECT_ID/trigger/pipeline
that you can use to trigger
a project's data pipeline. You will need your project ID, which you can find on
the project overview page.
The endpoint is expecting a minimum of 4 pieces of HTTP form data:
- Pipeline trigger token for authorization
token=YOUR_PIPELINE_TRIGGER_TOKEN
- Project ID for the project the pipeline belongs to
- Git ref to use
ref=YOUR_REF
- The pipeline configuration file to use
'variables[_PIPELINE_FILE_NAME]=YOUR_PIPELINE_FILE_NAME'
The _PIPELINE_FILE_NAME
is a reserved environment variable for DataOps.live
and references the pipeline configuration file.
You need to be authorized to trigger the pipeline using a pipeline trigger token.
Steps to get a pipeline trigger token:
- Navigate to your project.
- Navigate to Settings → CI/CD.
- Click Expand in the Pipeline triggers section. The page shows settings for pipeline trigger tokens.
- Fill in the Manage your project's triggers form.
- Click Add trigger. The page refreshes, and your trigger appears in the list.
You can copy the displaying token to call the API and run your pipeline.
To use this method on the command line, you will need to add your trigger token and project ID.
Run a curl command with form options:
curl -X POST --fail -F token=YOUR_PIPELINE_TRIGGER_TOKEN -F ref=YOUR_REF -F 'variables[_PIPELINE_FILE_NAME]=YOUR_PIPELINE_FILE_NAME' https://app.dataops.live/api/v4/projects/YOUR_PROJECT_ID/trigger/pipeline
Which pipeline configuration takes precedence?
There are situations where you can define pipeline configuration files in a request, a "commit message," and have a project default simultaneously. The platform has an order of precedence to decide which pipeline file to run.
An ordered list of questions to decide which pipeline file to run (the first question to answer "yes" will be the chosen pipeline file):
- Is this a scheduled run? Then the pipeline file defined in the schedule is used.
- Is this run from the platform, API, or a parent pipeline? Then the pipeline file defined in the request is used.
- Is this run from a Git push or merge request, and does this run include a pipeline file commit message? Then the file defined in the commit is used.
- Is there a project default? Then the project default is used.
- Nothing defined? Then
dataops-ci.yml
is used.
You don't need a dataops-ci.yml
. If the file is missing, then nothing happens.
FAQ
What happens when my commit message has a pipeline file name, and I start a pipeline using the platform?
We only use the commit message as the pipeline file name when that commit is pushed or merged. In this scenario, the pipeline file name set by the platform has precedence over the commit message.
What happens when my commit message has a pipeline file name that does not exist?
The precedence checks will see if there is a project default configuration file.
If there is no project default, it checks for a dataops-ci.yml
. If there
is no dataops-ci.yml
then nothing happens.