Structuring your Project
Getting a bit more advanced
In the previous section, we learned about the core concepts of pipelines. We have a working project now, but this isn't going to scale as we grow to the real-world DataOps requirements of most companies. So let's look at how we can add a bit more structure to the overall project.
What you will learn
In this section, you learn everything about config reuse. We start with how to use includes to refactor commonly used resources. We then show you how to extend jobs from base jobs using inheritance. To easily find resources, we reshuffle files into folders to create modularity. The section then closes out with variable mapping to cater for parametrized jobs.
Set aside 20 minutes to complete the section.
Including files
Our Pipeline config is getting busy now, so let us move out some of the more static content. It should run exactly as before, but with better organization of configurations.
- Create a new file
pipelines/includes/bootstrap.yml
- Move the
stages
andvariables
blocks into it. - Include the bootstrap file in the main pipeline (demo-ci.yml)
include:
- /pipelines/includes/bootstrap.yml
Say Hello:
tags: [dataops-101-shared-runner]
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"
Say Hello Again:
tags: [dataops-101-shared-runner]
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"
- Commit and Run
Extending jobs
We currently have quite a bit of repetition between these two jobs. Repetition is bad! To avoid repeating the tags block, we can use a base job and extend from it.
- Create a new template job called .agent_tag (we tell this system this isn't a job we want to run on its own by prefixing it with
.
) indemo-ci.yml
- Update both jobs to
extend
from this base job
include:
- /pipelines/includes/bootstrap.yml
.agent_tag:
tags: [dataops-101-shared-runner]
Say Hello:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"
Say Hello Again:
extends:
- .agent_tag
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"
- We can now go one step further and move this base job into its own file
pipelines/includes/config/agent_tag.yml
and then include this into ourbootstrap.yml
file.
.agent_tag:
tags: [dataops-101-shared-runner]
include:
- /pipelines/includes/config/agent_tag.yml
variables:
MY_NAME: Sam
stages:
- Stage One
- Stage Two
Introducing modularity
Now we can move more content into separate files. Note: no new pipeline functionality here - all runs should look the same as each other. We are just giving ourselves a clean project structure for the future!
- Create a new file
pipelines/includes/config/variables.yml
and move in the variables block from bootstrap.yml - Create a new file
pipelines/includes/config/stages.yml
and move in the stages block from bootstrap.yml - Include both files into bootstrap.yml
variables:
MY_NAME: Sam
stages:
- Stage One
- Stage Two
include:
- /pipelines/includes/config/agent_tag.yml
- /pipelines/includes/config/variables.yml
- /pipelines/includes/config/stages.yml
We can also move the jobs into their own files.
- Create a new file
pipelines/includes/local_includes/say_hello.yml
and move the job Say Hello into it. - Create a new file
pipelines/includes/local_includes/say_hello_again.yml
and move the job Hello Again into it. - Include both files into
demo-ci.yml
by adding these to theinclude
block
include:
- /pipelines/includes/bootstrap.yml
- /pipelines/includes/local_includes/say_hello.yml
- /pipelines/includes/local_includes/say_hello_again.yml
Multiple jobs with variables
First, let's create some duplication. We use it to showcase variable injection later on.
- In
pipelines/includes/config/variables.yml
create 3 new variables
variables:
MY_NAME: Sam
NAME1: Justin
NAME2: Guy
NAME3: Colin
- In
pipelines/includes/local_includes/say_hello.yml
duplicate the current job 3 times and use the three new variables
Say Hello:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"
Say Hello to Person 1:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $NAME1!"
Say Hello to Person 2:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $NAME2!"
Say Hello to Person 3:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $NAME3!"
- Commit and Run
Now we have far too much duplication between these jobs, so let's create another base job with all the duplication removed and inherits from it.
- Create
pipelines/includes/local_includes/base_hello.yml
.base_hello:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $NAME!"
Did you see that we not only moved all common things from the agent tag to the image to the base job? We also changed the variable name to just $NAME
to be able to override it in the next steps.
- Then include base_hello.yml in
demo-ci.yml
's include section:
include:
- /pipelines/includes/bootstrap.yml
- /pipelines/includes/local_includes/say_hello.yml
- /pipelines/includes/local_includes/say_hello_again.yml
- /pipelines/includes/local_includes/base_hello.yml
Finally, here is where the magic happens. We use the variable reference $NAME
to override each job's value with the content of $MY_NAME to $NAME3.
- Amend the jobs in
pipelines/includes/local_includes/say_hello.yml
to extend from the base job:
Say Hello:
extends:
- .base_hello
variables:
NAME: $MY_NAME
Say Hello to Person 1:
extends:
- .base_hello
variables:
NAME: $NAME1
Say Hello to Person 2:
extends:
- .base_hello
variables:
NAME: $NAME2
Say Hello to Person 3:
extends:
- .base_hello
variables:
NAME: $NAME3
- Commit and Run
Checkpoint 2
We have now built a DataOps project that:
- Has organized key concepts like stages, variables, agent_tag into separate files
- Has created multiple jobs, with minimal duplication and base job inheritance
- Shown you how to inject variable values from other variables
Your project should match the following structure now:
Time to take a break again before you head to the next section.