Core Concepts

The basics

What you will learn

In this section, you will:

Create your first Pipeline and decompose it into multiple Jobs
Deploy these jobs to a Runner and customize the available functionality
Control the sequential versus concurrent execution of jobs with Stages
Parametrize job execution with Variables

Set aside 30 minutes to complete the section.

Preparation

Under your top-level Groups menu, create a new Demo group
From the created group, click New project and use the Create a blank project tile
Name this project DataOps 101 Enablement Project

Hello world job

Create your first pipeline by adding a new file, demo-ci.yml
Create your first job Hello World in the pipeline by populating the content with the following code:
demo-ci.yml
```
Say Hello:
  script:
    - echo "Hello, World!"
```
Don't forget to Commit

First pipeline run

Navigate to CI / CD > Pipelines
Choose ** Run Pipeline** in the default branch using our newly created demo-ci.yml pipeline type

Pipeline stuck

This pipeline gets stuck because we have not told DataOps where to run the job.

Enabling the shared runner

Runners act as a proxy into your environment. If all your resources are accessible via the internet, we provide the DataOps Shared Runner. To enable this runner:

Within your project, navigate to Settings > CI / CD > Runners
Expand Shared Runners and toggle on Enable shared runners for this project

Deploying to a runner

Jobs are deployed to runners using a runner tag. Since we are using the shared runner, the tag to use is dataops-101-shared-runner. We can rerun the pipeline once we connect the job to the runner.

Navigate to Repository > Files; open the file and edit it in the Web IDE
Add the tag dataops-101-shared-runner and click Commit
From Pipeline Behaviour, select 'Run demo-ci.yml'
Choose the default branch
Commit with the comment getting the pipeline unstuck

demo-ci.yml
Say Hello:
  tags: [dataops-101-shared-runner]
  script:
    - echo "Hello, World!"

The pipeline reruns at this point. To verify the results navigate to "DataOps 101 Enablement Project > CI / CD > Pipelines". From this screen, you can execute "Run Pipeline" again.

Shortcut

Did you see what we did in the video? We chose the pipeline we wanted to run from the dropdown menu. The resultant pipeline ID then appeared below in the status bar, and we were able to open it in a new tab to see this pipeline without losing out on the current editing context.

Bonus Work

As a bonus, you can choose to install your runner when you need to be more flexible in your deployment, e.g., accessing resources behind a firewall in a hybrid deployment.

Customizing the orchestrator

Jobs are not executed inside the runner. Instead, the job runs inside a container backed by a container image. These are the DataOps Orchestrators.

You typically choose the type of orchestrator based on the workload. In our case, we keep running a Python script. To control which Python version is available for your job, let's customize the orchestrator. The result is a deployment as follows:

communication between DataOps app, the runner, and the orchestrators

In real-world examples, you will use the Snowflake Object Lifecycle Engine Orchestrator and your choice of ETL/ELT vendor integration.

To change the orchestrator to our desired Python3 Orchestrator, we need to add the image reference as follows:

Edit demo-ci.yml again and add the image property
Rerun the pipeline

demo-ci.yml
Say Hello:
  tags: [dataops-101-shared-runner]
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, World!"

See job logs

Did you see what we did in the video? We clicked on the job to see the log output. For jobs that take longer, we will see the output coming back in near realtime.

Stages

Stages are the fundamental method for sequencing jobs in a pipeline. Let's see how:

First, add a second job, commit, and run:

demo-ci.yml
Say Hello:
  tags: [dataops-101-shared-runner]
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, World!"

Say Hello Again:
  tags: [dataops-101-shared-runner]
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, World!"

Both jobs run concurrently in a default stage called Test. Visually you can see the two jobs grouped in the same stage. However, we want more control over how this works.

Define two stages at the top of demo-ci.yml:

demo-ci.yml
stages:
  - Stage One
  - Stage Two

Say Hello:
  tags: [dataops-101-shared-runner]
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, World!"

Say Hello Again:
  tags: [dataops-101-shared-runner]
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, World!"

Assign each job to a stage

demo-ci.yml
stages:
  - Stage One
  - Stage Two

Say Hello:
  tags: [dataops-101-shared-runner]
  stage: Stage One
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, World!"

Say Hello Again:
  tags: [dataops-101-shared-runner]
  stage: Stage Two
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, World!"

Commit and run

Now we are getting more control over how and when jobs run.

What if the first job fails?

If there is a problem with jobs in the first stages, jobs in subsequent stages do not run.

Play with the first job

demo-ci.yml
stages:
  - Stage One
  - Stage Two

Say Hello:
  tags: [dataops-101-shared-runner]
  stage: Stage One
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, World!"
    - I am not a valid command

Say Hello Again:
  tags: [dataops-101-shared-runner]
  stage: Stage Two
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, World!"

Variables

Create a variables block in your demo-ci.yml:

demo-ci.yml
variables:
  MY_NAME: Sam

stages:
  - Stage One
  - Stage Two

Say Hello:
  tags: [dataops-101-shared-runner]
  stage: Stage One
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, World!"

Say Hello Again:
  tags: [dataops-101-shared-runner]
  stage: Stage Two
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, World!"

Change both jobs to use the new variable.

demo-ci.yml
variables:
  MY_NAME: Sam

stages:
  - Stage One
  - Stage Two

Say Hello:
  tags: [dataops-101-shared-runner]
  stage: Stage One
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, $MY_NAME!"

Say Hello Again:
  tags: [dataops-101-shared-runner]
  stage: Stage Two
  image: dataopslive/dataops-python3-runner:5-stable
  script:
    - echo "Hello, $MY_NAME!"

Commit and run

Checkpoint 1

We have now built a DataOps project that:

Used all the core concepts of pipeline, stages, and jobs
Executed the pipeline by choosing the shared runner and binding it to a concrete Python Orchestrator
Introduced customization with variables

Now take a break before you head to the next chapter.

The basics​

What you will learn​

Preparation​

Hello world job​

First pipeline run​

Enabling the shared runner​

Deploying to a runner​

Customizing the orchestrator​

Stages​

Variables​

Checkpoint 1​