Skip to main content

Structuring your Project

Getting a bit more advanced

In the previous section, we learned about the core concepts of pipelines. We have a working project now, but this isn't going to scale as we grow to the real-world DataOps requirements of most companies. So let's look at how we can add a bit more structure to the overall project.

What you will learn

In this section, you learn everything about config reuse. We start with how to use includes to refactor commonly used resources. We then show you how to extend jobs from base jobs using inheritance. To easily find resources, we reshuffle files into folders to create modularity. The section then closes out with variable mapping to cater for parametrized jobs.

Set aside 20 minutes to complete the section.

Including files

Our Pipeline config is getting busy now, so let us move out some of the more static content. It should run exactly as before, but with better organization of configurations.

  • Create a new file pipelines/includes/bootstrap.yml
  • Move the stages and variables blocks into it.
  • Include the bootstrap file in the main pipeline (demo-ci.yml)
demo-ci.yml
include:
- /pipelines/includes/bootstrap.yml

Say Hello:
tags: [dataops-101-shared-runner]
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"

Say Hello Again:
tags: [dataops-101-shared-runner]
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"
  • Commit and Run

Extending jobs

We currently have quite a bit of repetition between these two jobs. Repetition is bad! To avoid repeating the tags block, we can use a base job and extend from it.

  • Create a new template job called .agent_tag (we tell this system this isn't a job we want to run on its own by prefixing it with .) in demo-ci.yml
  • Update both jobs to extend from this base job
demo-ci.yml
include:
- /pipelines/includes/bootstrap.yml

.agent_tag:
tags: [dataops-101-shared-runner]

Say Hello:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"

Say Hello Again:
extends:
- .agent_tag
stage: Stage Two
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"
  • We can now go one step further and move this base job into its own file pipelines/includes/config/agent_tag.yml and then include this into our bootstrap.yml file.
pipelines/includes/config/agent_tag.yml
.agent_tag:
tags: [dataops-101-shared-runner]
pipelines/includes/bootstrap.yml
include:
- /pipelines/includes/config/agent_tag.yml

variables:
MY_NAME: Sam

stages:
- Stage One
- Stage Two

Introducing modularity

Now we can move more content into separate files. Note: no new pipeline functionality here - all runs should look the same as each other. We are just giving ourselves a clean project structure for the future!

  • Create a new file pipelines/includes/config/variables.yml and move in the variables block from bootstrap.yml
  • Create a new file pipelines/includes/config/stages.yml and move in the stages block from bootstrap.yml
  • Include both files into bootstrap.yml
pipelines/includes/config/variables.yml
variables:
MY_NAME: Sam
pipelines/includes/config/stages.yml
stages:
- Stage One
- Stage Two
pipelines/includes/bootstrap.yml
include:
- /pipelines/includes/config/agent_tag.yml
- /pipelines/includes/config/variables.yml
- /pipelines/includes/config/stages.yml

We can also move the jobs into their own files.

  • Create a new file pipelines/includes/local_includes/say_hello.yml and move the job Say Hello into it.
  • Create a new file pipelines/includes/local_includes/say_hello_again.yml and move the job Hello Again into it.
  • Include both files into demo-ci.yml by adding these to the include block
demo-ci.yml
include:
- /pipelines/includes/bootstrap.yml
- /pipelines/includes/local_includes/say_hello.yml
- /pipelines/includes/local_includes/say_hello_again.yml

Multiple jobs with variables

First, let's create some duplication. We use it to showcase variable injection later on.

  • In pipelines/includes/config/variables.yml create 3 new variables
pipelines/includes/config/variables.yml
variables:
MY_NAME: Sam
NAME1: Justin
NAME2: Guy
NAME3: Colin
  • In pipelines/includes/local_includes/say_hello.yml duplicate the current job 3 times and use the three new variables
pipelines/includes/local_includes/say_hello.yml
Say Hello:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $MY_NAME!"

Say Hello to Person 1:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $NAME1!"

Say Hello to Person 2:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $NAME2!"

Say Hello to Person 3:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $NAME3!"
  • Commit and Run

Now we have far too much duplication between these jobs, so let's create another base job with all the duplication removed and inherits from it.

  • Create pipelines/includes/local_includes/base_hello.yml
pipelines/includes/local_includes/base_hello.yml
.base_hello:
extends:
- .agent_tag
stage: Stage One
image: dataopslive/dataops-python3-runner:5-stable
script:
- echo "Hello, $NAME!"
variable reference

Did you see that we not only moved all common things from the agent tag to the image to the base job? We also changed the variable name to just $NAME to be able to override it in the next steps.

  • Then include base_hello.yml in demo-ci.yml's include section:
demo-ci.yml
include:
- /pipelines/includes/bootstrap.yml
- /pipelines/includes/local_includes/say_hello.yml
- /pipelines/includes/local_includes/say_hello_again.yml
- /pipelines/includes/local_includes/base_hello.yml

Finally, here is where the magic happens. We use the variable reference $NAME to override each job's value with the content of $MY_NAME to $NAME3.

  • Amend the jobs in pipelines/includes/local_includes/say_hello.yml to extend from the base job:
pipelines/includes/local_includes/say_hello.yml
Say Hello:
extends:
- .base_hello
variables:
NAME: $MY_NAME

Say Hello to Person 1:
extends:
- .base_hello
variables:
NAME: $NAME1

Say Hello to Person 2:
extends:
- .base_hello
variables:
NAME: $NAME2

Say Hello to Person 3:
extends:
- .base_hello
variables:
NAME: $NAME3
  • Commit and Run

Checkpoint 2

We have now built a DataOps project that:

  • Has organized key concepts like stages, variables, agent_tag into separate files
  • Has created multiple jobs, with minimal duplication and base job inheritance
  • Shown you how to inject variable values from other variables

Your project should match the following structure now:

101 project structure

Time to take a break again before you head to the next section.