Core Concepts
The Basics
What you will learn
In this first section, you learn the following:
- How to create your first Pipeline and how to decompose it into multiple Jobs
- Deploy these jobs to a Runner and customize the available functionality
- Learn about Stages, allowing you to control the sequential versus concurrent execution of jobs
- Learn how to parametrize job execution with Variables
Set aside 30 minutes to complete the section.
Preparation
- Under your top-level Group menu, create a new
Demo
group - Directly from the created group, hit New project and use the 'Create a blank project' tile
- Name this project
DataOps 101 Enablement Project
Hello World Job
Create your first pipeline by adding a new file,
demo-ci.yml
Create your first job Hello World in the pipeline by populating the content with the following code:
demo-ci.ymlSay Hello:
script:
- echo "Hello, World!"Don't forget to Commit
First Pipeline run
- Navigate to CI / CD > Pipelines
- Choose the Run Pipeline in the default branch using our newly created
demo-ci.yml
pipeline type
Pipeline stuck
This pipeline gets stuck because we have not told DataOps where to run this job.
Enabling the Shared Runner
Runners act as a proxy into your compute environment. If all your resources are accessible via the internet, we provide the DataOps Shared Runner. To enable this runner:
- Within your project navigate to Settings > CI / CD > Runners
- Expand Shared Runners and click Enable Shared Runners
Deploying to a Runner
Jobs are deployed to runners using a runner tag. Since we are using the shared runner, the tag to use is dataops-101-shared-runner. We can rerun the pipeline once we wired up the job to the runner.
- Navigate Repository > Files; open the file, and then edit in the Web IDE
- Add the tag
dataops-101-shared-runner
tag - Choose your Pipeline Behaviour as 'Run demo-ci.yml'
- Choose the default branch
- Commit with a comment
getting the pipeline unstuck
Say Hello:
tags: [dataops-101-shared-runner]
script:
- echo "Hello, World!"
The pipeline reruns at this point. To verify the results navigate to "DataOps 101 Enablement Project > CI / CD > Pipelines". From this screen, you can execute "Run Pipeline" again.
Shortcut
Did you see what we did in the video? We chose the pipeline we wanted to run from the dropdown menu. The resultant pipeline ID then appeared below in the status bar, and we were able to open it in a new tab to see this pipeline without losing out on the current editing context.
Bonus Work
As a bonus, you can choose to install your runner when you need to be more flexible in your deployment, e.g., accessing resources behind a firewall in a hybrid deployment.
Customizing the Orchestrator
Jobs are not executed inside the runner. Instead, the job runs inside a container backed by a container image. These are the DataOps Orchestrators.
You typically choose the type of orchestrator based on the workload. In our case, we keep running a Python script. To control which Python version is available for your job, let's customize the orchestrator. The result is a deployment as follows:
In real-world examples, you will use the Snowflake Object Lifecycle Engine Orchestrator and your choice of ETL/ELT vendor integration.
To change the orchestrator to our desired Python3 Orchestrator, we need to add the image reference as follows:
- Edit
demo-ci.yml
again and add the image property - Rerun the pipeline
Say Hello:
tags: [dataops-101-shared-runner]
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
See job logs
Did you see what we did in the video? We clicked on the job to see the log output. For jobs that take longer, we will see the output coming back in near realtime.
Stages
Stages are the fundamental method for sequencing jobs in a pipeline. Let's see how:
- First, add a second job commit and run:
Say Hello:
tags: [dataops-101-shared-runner]
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
Say Hello Again:
tags: [dataops-101-shared-runner]
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
Both jobs run concurrently in a default stage called Test. Visually you can see the two jobs grouped in the same stage. However, we want more control over how this works.
- Define two stages at the top of demo-ci.yml:
stages:
- Stage One
- Stage Two
Say Hello:
tags: [dataops-101-shared-runner]
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
Say Hello Again:
tags: [dataops-101-shared-runner]
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
- Assign each job to a stage
stages:
- Stage One
- Stage Two
Say Hello:
tags: [dataops-101-shared-runner]
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
Say Hello Again:
tags: [dataops-101-shared-runner]
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
- Commit and run
Now we are getting more control over how and when jobs run.
What if the first job fails?
If there is a problem with jobs in the first stages, jobs in subsequent stages do not run.
- Play with the first job
stages:
- Stage One
- Stage Two
Say Hello:
tags: [dataops-101-shared-runner]
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
- I am not a valid command
Say Hello Again:
tags: [dataops-101-shared-runner]
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
Variables
- Create a
variables
block in yourdemo-ci.yml
:
variables:
MY_NAME: Sam
stages:
- Stage One
- Stage Two
Say Hello:
tags: [dataops-101-shared-runner]
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
Say Hello Again:
tags: [dataops-101-shared-runner]
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, World!"
- Change both jobs to use the new variable.
variables:
MY_NAME: Sam
stages:
- Stage One
- Stage Two
Say Hello:
tags: [dataops-101-shared-runner]
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"
Say Hello Again:
tags: [dataops-101-shared-runner]
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"
- Commit and run
Checkpoint 1
We have now built a DataOps project that:
- Leverages all the core concepts of pipeline, stages, and jobs
- Executed the pipeline by choosing the shared runner and binding it to a concrete Python Orchestrator
- Introduced customization with variables
Now take a break before you head to the next chapter.