Skip to main content

Core Concepts

The Basics

What you will learn

In this first section, you learn the following:

  1. How to create your first Pipeline and how to decompose it into multiple Jobs
  2. Deploy these jobs to a Runner and customize the available functionality
  3. Learn about Stages, allowing you to control the sequential versus concurrent execution of jobs
  4. Learn how to parametrize job execution with Variables

Set aside 30 minutes to complete the section.

Preparation

  1. Under your top-level Group menu, create a new Demo group
  2. Directly from the created group, hit New project and use the 'Create a blank project' tile
  3. Name this project DataOps 101 Enablement Project

Hello World Job

  1. Create your first pipeline by adding a new file, demo-ci.yml

  2. Create your first job Hello World in the pipeline by populating the content with the following code:

    demo-ci.yml
    Say Hello:
    script:
    - echo "Hello, World!"
  3. Don't forget to Commit

First Pipeline run

  1. Navigate to CI / CD > Pipelines
  2. Choose the Run Pipeline in the default branch using our newly created demo-ci.yml pipeline type
Pipeline stuck

This pipeline gets stuck because we have not told DataOps where to run this job.

Enabling the Shared Runner

Runners act as a proxy into your compute environment. If all your resources are accessible via the internet, we provide the DataOps Shared Runner. To enable this runner:

  1. Within your project navigate to Settings > CI / CD > Runners
  2. Expand Shared Runners and click Enable Shared Runners

Deploying to a Runner

Jobs are deployed to runners using a runner tag. Since we are using the shared runner, the tag to use is dataops-101-shared-runner. We can rerun the pipeline once we wired up the job to the runner.

  1. Navigate Repository > Files; open the file, and then edit in the Web IDE
  2. Add the tag dataops-101-shared-runner tag
  3. Choose your Pipeline Behaviour as 'Run demo-ci.yml'
  4. Choose the default branch
  5. Commit with a comment getting the pipeline unstuck
demo-ci.yml
Say Hello:
tags: [dataops-101-shared-runner]
script:
- echo "Hello, World!"

The pipeline reruns at this point. To verify the results navigate to "DataOps 101 Enablement Project > CI / CD > Pipelines". From this screen, you can execute "Run Pipeline" again.

Shortcut

Did you see what we did in the video? We chose the pipeline we wanted to run from the dropdown menu. The resultant pipeline ID then appeared below in the status bar, and we were able to open it in a new tab to see this pipeline without losing out on the current editing context.

Bonus Work

As a bonus, you can choose to install your runner when you need to be more flexible in your deployment, e.g., accessing resources behind a firewall in a hybrid deployment.

Customizing the Orchestrator

Jobs are not executed inside the runner. Instead, the job runs inside a container backed by a container image. These are the DataOps Orchestrators.

You typically choose the type of orchestrator based on the workload. In our case, we keep running a Python script. To control which Python version is available for your job, let's customize the orchestrator. The result is a deployment as follows:

communication between DataOps app, the runner, and the orchestrators

In real-world examples, you will use the Snowflake Object Lifecycle Engine Orchestrator and your choice of ETL/ELT vendor integration.

To change the orchestrator to our desired Python3 Orchestrator, we need to add the image reference as follows:

  1. Edit demo-ci.yml again and add the image property
  2. Rerun the pipeline
demo-ci.yml
Say Hello:
tags: [dataops-101-shared-runner]
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
See job logs

Did you see what we did in the video? We clicked on the job to see the log output. For jobs that take longer, we will see the output coming back in near realtime.

Stages

Stages are the fundamental method for sequencing jobs in a pipeline. Let's see how:

  1. First, add a second job commit and run:
demo-ci.yml
Say Hello:
tags: [dataops-101-shared-runner]
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"

Say Hello Again:
tags: [dataops-101-shared-runner]
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"

Both jobs run concurrently in a default stage called Test. Visually you can see the two jobs grouped in the same stage. However, we want more control over how this works.

  1. Define two stages at the top of demo-ci.yml:
demo-ci.yml
stages:
- Stage One
- Stage Two

Say Hello:
tags: [dataops-101-shared-runner]
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"

Say Hello Again:
tags: [dataops-101-shared-runner]
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
  1. Assign each job to a stage
demo-ci.yml
stages:
- Stage One
- Stage Two

Say Hello:
tags: [dataops-101-shared-runner]
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"

Say Hello Again:
tags: [dataops-101-shared-runner]
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
  1. Commit and run

Now we are getting more control over how and when jobs run.

What if the first job fails?

If there is a problem with jobs in the first stages, jobs in subsequent stages do not run.

  1. Play with the first job
demo-ci.yml
stages:
- Stage One
- Stage Two

Say Hello:
tags: [dataops-101-shared-runner]
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
- I am not a valid command

Say Hello Again:
tags: [dataops-101-shared-runner]
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"

Variables

  1. Create a variables block in your demo-ci.yml:
demo-ci.yml
variables:
MY_NAME: Sam

stages:
- Stage One
- Stage Two

Say Hello:
tags: [dataops-101-shared-runner]
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"

Say Hello Again:
tags: [dataops-101-shared-runner]
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
  1. Change both jobs to use the new variable.
demo-ci.yml
variables:
MY_NAME: Sam

stages:
- Stage One
- Stage Two

Say Hello:
tags: [dataops-101-shared-runner]
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"

Say Hello Again:
tags: [dataops-101-shared-runner]
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"
  1. Commit and run

Checkpoint 1

We have now built a DataOps project that:

  • Leverages all the core concepts of pipeline, stages, and jobs
  • Executed the pipeline by choosing the shared runner and binding it to a concrete Python Orchestrator
  • Introduced customization with variables

Now take a break before you head to the next chapter.