Skip to main content

Base Configuration

The base SOLE configuration that ships with the standard DataOps template contains definitions for the basic warehouses, roles and users, and sets up the default DataOps database.

In this section, we will examine each of these configurations, which can be found in the dataops/snowflake directory of any DataOps project.

Unpacking the SOLE Configuration

Warehouses

For full details of warehouse configuration, please see the SOLE documentation (warehouse).

The DataOps template defines two warehouses, INGESTION and TRANSFORMATION. Here is an extract from warehouses.yml:

dataops/snowflake/warehouses.yml
warehouses:
INGESTION:
comment: Warehouse for Ingestion operations
warehouse_size: MEDIUM
max_cluster_count: 2
min_cluster_count: 1
scaling_policy: ECONOMY
auto_suspend: 60
auto_resume: true
namespacing: prefix

grants:
USAGE:
- WRITER
- ADMIN
MONITOR:
- ADMIN
OPERATE:
- ADMIN
...

It's worth noting from this configuration that the configured name of the warehouse is INGESTION, but the namespacing attribute is set to prefix, so the full name in Snowflake will include the DATAOPS_PREFIX at the start: DATAOPS_INGESTION by default. However, as namespacing is prefix-only, this warehouse will not have an environment-specific suffix, so will therefore be shared across all environments.

Roles

For full details of role configuration, please see the SOLE documentation (role).

dataops/snowflake/roles.yml
roles:
READER:
namespacing: prefix
roles:
- WRITER

WRITER:
namespacing: prefix
users:
- MASTER
- INGESTION
- TRANSFORMATION
roles:
- ADMIN

ADMIN:
namespacing: prefix
users:
- MASTER
roles:
- SYSADMIN

Out of the box, DataOps defines three roles: ADMIN, WRITER and READER. The reader role is granted to the writer, which in turn is granted to the admin role, maintaining a role hierarchy. This hierarchy is completed by granting the admin role to the externally-defined SYSADMIN role.

tip

If your Snowflake security configuration requires a parent role other than SYSADMIN for integrations such as DataOps, substitute that role for SYSADMIN in the role grants for DataOps ADMIN.

Note that similar to the warehouses the namespacing for these roles is also prefix, so they too will be shared across environments.

Users

The users defined in the users.yml file are mainly intended for support of legacy features - DataOps will (and does out of the box) happily run on the single SOLE admin user, created as part of the DataOps setup.

Also, a limitation of Snowflake means that any users created via SOLE will need to have their password set manually following their creation.

Default Database

You will notice that, out of the four standard SOLE configuration files, this one has templated content.

dataops/snowflake/databases.template.yml
databases:
"{{ env.DATAOPS_DATABASE }}":
{# For non-production branches, this will be a clone of production #}
{% if (env.DATAOPS_ENV_NAME != 'PROD' and env.DATAOPS_ENV_NAME != 'QA') %}
from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
{% endif %}

comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}
grants:
USAGE:
- WRITER
- READER
...

"{{ env.DATAOPS_DATABASE }}" - this sets the name of the default database from an automatic variable generated by DataOps internally every time the pipeline is run. By default, the database name comprises the project prefix (DATAOPS_PREFIX) followed by the environment name, so in production the database will be named DATAOPS_PROD. Also, SOLE detects this is the default database and does not attempt any namespacing (as the default name already includes this), so no namespacing attribute is necessary.

{% if ... %} - this conditional block will only resolve for development branches (i.e. not master or qa). This means that for any other branch (e.g. dev, feature branches) the default database will be created as a clone of the production database (identified by the pre-computed DATAOPS_DATABASE_MASTER variable). For master and qa branches, no database will be cloned and just the content specified in this file will be added.

When defining databases (and schemas), remember to include USAGE grants for your roles!

Exercise: Run a Pipeline and Set up Snowflake

For this exercise, you will need your clean DataOps project, freshly created from the standard template. You will also need a runner available (usually this will be defined on the parent group of the project).

Tip

If you encounter pipeline errors, and the solution is not immediately obvious, you can re-run the pipeline with the variable DATAOPS_DEBUG set to 1. This will output additional debug information into the job logs.

  1. Open WebIDE on the master branch.

  2. Open file pipelines/includes/config/agent_tag.yml and set your runner name.

  3. Open file pipelines/includes/config/variables.yml and set the following variables:

    • DATAOPS_PREFIX - set this to DATAOPS_SOLE_TRAINING
    • DATAOPS_VAULT_KEY - generate a string of suitably randomized characters for this
    • Secrets manager setup
  4. Open file full-ci.yml and comment out the Modelling and transformation jobs and Generate modelling and transformation documentation sections (we don't need these for this exercise).

  5. Commit to master and run pipeline full-ci.yml.

    info

    The first pipeline to run in any new DataOps project must be on the master branch. This sets up the production environment, from which other environments will be cloned.

  6. After the pipeline has run, take a look at the Snowflake setup. You should be able to see:

    • Database DATAOPS_SOLE_TRAINING_PROD with the expected grants
    • Two new warehouses
    • The new users and roles (you may need to switch to ACCOUNTADMIN)