Skip to main content

Running Pipelines

note

For an in-depth explanation of the DataOps pipeline types and structure, see the Pipeline Overview documentation.

Multiple pipeline configurations

Your project can have multiple pipeline configuration files that define the jobs needed to fulfill the needs of your data pipeline.

You must add the pipeline configuration files to the root of the project. The data product platform will consider any file ending with -ci.yml as a pipeline configuration file, e.g., full-ci.yml, my-data-pipe-ci.yml.

Methods to run pipelines

You can run pipelines using a variety of methods. Each method can define which pipeline configuration file to use.

The following is a list of the methods you can use to start a pipeline:

Run pipeline form

Use this method when you want to run an ad-hoc pipeline via the web user interface and when the pipelines don't have a regular processing time or testing pipeline changes.

Steps to use this method:

  1. Navigate to your project.

  2. Navigate to CI/CDPipelines.

    Main nav CICD menu item highlighted !!shadow!!

  3. Click Run pipeline. You will be redirected to the Run pipeline form view.

    Pipelines view highlighting Run pipeline button !!shadow!!

  4. Fill in the Run pipeline form.

    Run pipeline form with fields highlighted !!shadow!!

    1. Select the branch or tag where your pipeline configuration file exists.
    2. Select the pipeline type as your chosen pipeline configuration file.
  5. Click Run pipeline to start the pipeline. You will be redirected to the pipeline in progress view, where your pipeline jobs are created and ready to start.

New schedule form

Use this method when you want to run a pipeline configuration file at regular intervals which make running pipelines more consistent.

Steps to use this method:

  1. Navigate to your project.

  2. Navigate to CI/CD → Schedules.

    Main nav CICD Schedules menu item highlighted !!shadow!!

  3. Click New schedule. You will be redirected to the Schedule a new pipeline form view.

    Schedules view with New schedule btn highlighted !!shadow!!

  4. Fill in the Schedule a new pipeline form.

    New schedule form with fields highlighted !!shadow!!

    1. Enter a description that appears on the pipeline schedules view.
    2. Select an interval pattern or create your own.
    3. Select the cron timezone.
    4. Select the branch or tag where your pipeline configuration file exists.
    5. Select the pipeline type as your chosen pipeline configuration file.
  5. Click Save pipeline schedule. You will be redirected to the Schedules view where you can see your new schedule.

Commit message

Use this method when you author a change to your data pipeline that you would like to run as soon as you commit and push the change. This makes the development loop quicker by running a specific pipeline file per development commit.

The data product platform looks for your pipeline configuration filename anywhere in the commit message in the following form:

[<file-name> ci]

The following are examples of commit messages where the pipeline file is called new-ci.yml:

  • Some commit message [new-ci.yml ci]
  • [new-ci.yml ci] another message
  • Longer commit message [new-ci.yml ci] and more

When you push any of these messages, it will run the new-ci.ym pipeline configuration file.

info

A message that includes [skip ci] guarantees none of the pipelines will run.

Steps to use this method from the Web IDE:

  1. Navigate to your project.

  2. Click Web IDE. An instance of VS Code is open showing your repo files.

    Project view with Web IDE btn highlighted !!shadow!!

  3. Make a change to a file in the repo. The Source Control icon shows pending changes.

  4. Click this icon and select to commit to main or to a new branch.

    Web IDE commit changes form !!shadow!!

  5. Select the pipeline type as your chosen pipeline configuration file. A message prompts you to validate the branch you're committing to followed by a confirmation message that summarizes all the details of your commit.

    Web IDE commit changes form !!shadow!!

  6. Click Yes. A success message displays on the bottom right of VS Code.

  7. Click Go to Project to switch to your project details.

  8. On the bottom status bar, click the pipeline ID to see a live progress feed of the running pipeline.

Steps to use this method on the command line. Note that these steps assume you have cloned your project locally and made a change:

  1. git add --all
  2. git commit -m "Some commit message [new-ci.yml]"
  3. git push

CLI commit changes with message !!shadow!!

To see the running pipeline:

  1. Navigate to your project.
  2. Navigate to CI/CD → Pipelines, to see the new-ci.yml pipeline running.

API (REST)

Use this method to integrate DataOps pipelines with scripts and software to use an external scheduling system or event-driven data pipeline architecture.

DataOps.live has a POST REST endpoint /api/v4/projects/YOUR_PROJECT_ID/trigger/pipeline that you can use to trigger a project's data pipeline. You will need your project ID, which you can find on the project overview page.

The endpoint is expecting a minimum of 4 pieces of HTTP form data:

  • Pipeline trigger token for authorization token=YOUR_PIPELINE_TRIGGER_TOKEN
  • Project ID for the project the pipeline belongs to
  • Git ref to use ref=YOUR_REF
  • The pipeline configuration file to use 'variables[_PIPELINE_FILE_NAME]=YOUR_PIPELINE_FILE_NAME'
note

The _PIPELINE_FILE_NAME is a reserved environment variable for DataOps.live and references the pipeline configuration file.

You need to be authorized to trigger the pipeline using a pipeline trigger token.

Steps to get a pipeline trigger token:

  1. Navigate to your project.
  2. Navigate to Settings → CI/CD.
  3. Click Expand in the Pipeline triggers section. The page shows settings for pipeline trigger tokens.
  4. Fill in the Manage your project's triggers form.
  5. Click Add trigger. The page refreshes, and your trigger appears in the list.

You can copy the displaying token to call the API and run your pipeline.

To use this method on the command line, you will need to add your trigger token and project ID.

Run a curl command with form options:

curl -X POST --fail -F token=YOUR_PIPELINE_TRIGGER_TOKEN -F ref=YOUR_REF -F 'variables[_PIPELINE_FILE_NAME]=YOUR_PIPELINE_FILE_NAME' https://app.dataops.live/api/v4/projects/YOUR_PROJECT_ID/trigger/pipeline

Which pipeline configuration takes precedence?

There are situations where you can define pipeline configuration files in a request, a "commit message," and have a project default simultaneously. The platform has an order of precedence to decide which pipeline file to run.

An ordered list of questions to decide which pipeline file to run (the first question to answer "yes" will be the chosen pipeline file):

  1. Is this a scheduled run? Then the pipeline file defined in the schedule is used.
  2. Is this run from the platform, API, or a parent pipeline? Then the pipeline file defined in the request is used.
  3. Is this run from a Git push or merge request, and does this run include a pipeline file commit message? Then the file defined in the commit is used.
  4. Is there a project default? Then the project default is used.
  5. Nothing defined? Then dataops-ci.yml is used.
info

You don't need a dataops-ci.yml. If the file is missing, then nothing happens.

FAQ

What happens when my commit message has a pipeline file name, and I start a pipeline using the platform?

We only use the commit message as the pipeline file name when that commit is pushed or merged. In this scenario, the pipeline file name set by the platform has precedence over the commit message.

What happens when my commit message has a pipeline file name that does not exist?

The precedence checks will see if there is a project default configuration file. If there is no project default, it checks for a dataops-ci.yml. If there is no dataops-ci.yml then nothing happens.