Skip to main content

Adding More Configuration

Now that you have become familiar with the base SOLE configuration, we can do a few exercises looking at adding new objects in SOLE. You will need to continue using the new DataOps project you used in the previous section.

We will use WebIDE for file editing in these exercises, although you can of course use a different IDE if you wish.

Exercise 1: Schemas and Tables

For this section, we will set up a schema and a couple of tables.

  1. Open file dataops/snowflake/databases.template.yml and add the following schemas section after the database grants:

    dataops/snowflake/databases.template.yml
    ...
    grants:
    ...

    schemas:
    TRAINING:
    tables:
    TRAINING_TABLE1:
    columns:
    COL1:
    type: INT
    COL2:
    type: VARCHAR
  2. Commit to master and run pipeline full-ci.yml

  3. Open Snowflake and observe the new schema and table.

  4. Now, return to databases.template.yml and add a second table, named TRAINING_TABLE2. Set up any columns you wish - refer to the SOLE table reference for full configuration details.

    Why not try...

    ...adding a primary key to your new table?

  5. Commit and run the pipeline as before.

Exercise 2: Production-Only Warehouse

Now let's venture a little into the realm of Jinja templating, and create a warehouse that will only exist in the PROD environment.

  1. Rename file dataops/snowflake/warehouses.yml to warehouses.template.yml

  2. Open the file and add the following new warehouse configuration to the bottom of the file:

    dataops/snowflake/warehouses.template.yml
      PRODUCTION_WH:
    comment: Production use only!
    warehouse_size: XSMALL
    auto_suspend: 60
    auto_resume: true
    namespacing: prefix
  3. Commit to master and run pipeline full-ci.yml.

  4. Take a look at the new warehouse in Snowflake.

  5. Now, let's add the magic! Alter the new warehouse config so it looks like this:

    dataops/snowflake/warehouses.template.yml
      {% if env.DATAOPS_ENV_NAME == 'PROD' %}
    PRODUCTION_WH:
    comment: Production use only!
    warehouse_size: XSMALL
    auto_suspend: 60
    auto_resume: true
    namespacing: prefix
    {% endif %}
    caution

    Be careful not to introduce any additional indenting of content within {% %} blocks. Additional indenting may make the code slightly more readable but may break the YAML structure!

  6. Commit to master and run pipeline full-ci.yml again.

  7. SOLE should not make any changes - as far as it's concerned nothing has changed, as we're still running in the production environment. We will check on this warehouse in the next exercise to make sure it doesn't pop up in a non-production environment!

Why not try...

...adding some grants to your new warehouse so that READER and WRITER can use it?

Exercise 3: What Happens in Dev?

So far, we have limited ourselves to committing directly to master (production) and running pipelines there. Now, we will create a development branch and see what happens.

  1. Open your project's main page and navigate, using the left hand bar, to Repository > Branches.

  2. Create a new branch called dev from master.

  3. Navigate to CI / CD > Pipelines and check that the full-ci.yml pipeline is running in dev. If not, click Run pipeline and start full-ci.yml in dev.

  4. Once the pipeline completes (it will show as Blocked as the final job is manual), take a look in Snowflake. There should be a new database named DATAOPS_SOLE_TRAINING_DEV, a clone of the production database (check it has the same schemas/tables), but not a new warehouse, as this has been configured to only exist in the production environment.

Why not try...

...creating a feature branch from dev and running the pipeline there. What do the new Snowflake objects look like?

Exercise 4: Making a Change in Dev

Now that we have a development branch, and a handy cloned database to work in, let's create something new in there.

  1. In WebIDE, switch to the dev branch

  2. Open file dataops/snowflake/databases.template.yml and add the following stages block to the TRAINING schema, below the tables section (be careful with your indenting!):

    dataops/snowflake/databases.template.yml
            tables:
    ...

    stages:
    TRAINING_STAGE:
    comment: Created for training purposes
    url: s3://no-such-bucket/no-such-path
  3. Commit to dev and run pipeline full-ci.yml.

  4. Once the pipeline completes, take a look at your new stage in the DATAOPS_SOLE_TRAINING_DEV database.

Why not try...

...creating a file format or two as well?

Exercise 5: Merge into Production

Let's say that we've completed developing all our new database features, and they have been thoroughly tested (out of scope of this guide). We'll need to raise a Merge Request (MR) into master (production environment).

Well, actually...

In most real-world projects there will be one or more test/pre-production environments in between. Let's pretend that's already happened!

  1. Open your project's main page and navigate, using the left hand bar, to Merge Requests.

  2. Click Create merge request and create a new MR from the dev branch into master.

  3. Enter Production Release for the title, anything you like for the description, and assign the MR to yourself.

  4. Important: Make sure the option to delete the source branch is unchecked.

    Otherwise...

    ...you won't be able to run the tear-down in the next exercise, as your dev branch will have gone!

  5. Click Submit to raise the request. If you're not that familiar with Merge Requests, feel free to explore the MR form, especially the Changes tab.

  6. Pretend to be someone else and merge the MR with the Merge button.

  7. Navigate to CI / CD > Pipelines and run a new pipeline for the master branch (full-ci.yml as before).

  8. Once the pipeline completes, take a look at your new features in the DATAOPS_SOLE_TRAINING_PROD database.

Exercise 6: Tear Down Dev

Assuming you didn't allow the MR process in the previous exercise to delete the dev branch, you can now go back and clean up all your development resources. We can then delete the branch itself.

info

In most DataOps projects, the dev branch is considered persistent and will not normally be cleaned up and deleted. We're simplifying this a bit for this guide, so go with us!

  1. Navigate to CI / CD > Pipelines and find the most recent dev branch pipeline.

  2. Click on the ⚙ Blocked status button to enter the pipeline details.

  3. Noting that it has not yet run, click the button on the Tear Down Snowflake job.

  4. Once the job completes, take a look in Snowflake. The DATAOPS_SOLE_TRAINING_DEV database should have been dropped, along with any other resources suffixed to the DEV environment (if you created any).

  5. Now, the dev branch can be safely deleted. Navigate to Repository > Branches and delete it.