Base Configuration

The base SOLE configuration that ships with the standard DataOps template contains definitions for the basic warehouses, roles and users, and sets up the default DataOps database.

In this section, we will examine each of these configurations, which can be found in the dataops/snowflake directory of any DataOps project.

Unpacking the SOLE configuration

Warehouses

For complete details of warehouse configuration, see the SOLE Reference Guide — Warehouse.

The DataOps template defines two warehouses, INGESTION and TRANSFORMATION. Here is an extract from warehouses.yml:

dataops/snowflake/warehouses.yml
warehouses:
  INGESTION:
    comment: Warehouse for Ingestion operations
    warehouse_size: MEDIUM
    max_cluster_count: 2
    min_cluster_count: 1
    scaling_policy: ECONOMY
    auto_suspend: 40
    auto_resume: true
    namespacing: prefix

    grants:
      USAGE:
        - WRITER
        - ADMIN
      MONITOR:
        - ADMIN
      OPERATE:
        - ADMIN
  ...

It's worth noting from this configuration that the configured name of the warehouse is INGESTION, but the namespacing attribute is set to prefix, so the full name in Snowflake will include the DATAOPS_PREFIX at the start: DATAOPS_INGESTION by default. However, as namespacing is prefix-only, this warehouse will not have an environment-specific suffix, so will therefore be shared across all environments.

Roles

Out of the box, DataOps defines three roles: ADMIN, WRITER and READER. Here's an example of the three roles you can specify in the roles.yml file. For complete details of role configuration, see SOLE Reference Guide — Role.

dataops/snowflake/roles.yml
roles:
  READER:
    namespacing: prefix
    roles:
      - WRITER

  WRITER:
    namespacing: prefix
    users:
      - MASTER
      - INGESTION
      - TRANSFORMATION
    roles:
      - ADMIN

  ADMIN:
    namespacing: prefix
    users:
      - MASTER
    roles:
      - SYSADMIN

The READER role is granted to the WRITER, which in turn is granted to the ADMIN role, maintaining a role hierarchy. This hierarchy is completed by granting the ADMIN role to the externally-defined SYSADMIN role.

tip

If your Snowflake security configuration requires a parent role other than SYSADMIN for integrations such as DataOps, substitute that role for SYSADMIN in the role grants for DataOps ADMIN.

Note that similar to the warehouses, the namespacing for these roles is also prefix, so they too will be shared across environments.

Users

Here's an example of possible properties you can define in the users.yml file. For complete user configuration details, see SOLE Reference Guide — User.

dataops/snowflake/users.yml
users:
  JOHNDOE:
    namespacing: none
    login_name: "JOHNDOE"
    disabled: false
    display_name: "JOHNDOE"
    first_name: "John"
    last_name: "Doe"
    email: "john.doe@dataops.live"
    must_change_password: false
    rsa_public_key: "example_public_key"
    default_warehouse: COMPUTE_WH
    default_role: ROLE_FOR_JOHN_DOE
    comment: User Login for DataOps using Key Pair Auth

The users defined in the users.yml file are intended to support legacy features — DataOps can run on the single, out-of-the-box SOLE admin user created as part of the DataOps setup.

note

Any users created via SOLE must have their password set manually following their creation.

Default database

Out of the four standard SOLE configuration files, the default database file has templated content.

dataops/snowflake/databases.template.yml
databases:
  "{{ env.DATAOPS_DATABASE }}":
    {# For non-production branches, this will be a clone of production #}
    {% if (env.DATAOPS_ENV_NAME != 'PROD' and env.DATAOPS_ENV_NAME != 'QA') %}
    from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
    {% endif %}

    comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}
    grants:
      USAGE:
        - WRITER
        - READER
  ...

"{{ env.DATAOPS_DATABASE }}" - this sets the name of the default database from an automatic variable generated by DataOps internally every time the pipeline runs. By default, the database name comprises the project prefix (DATAOPS_PREFIX) followed by the environment name, so in production, the database will be named DATAOPS_PROD. Also, SOLE detects this is the default database and does not attempt any namespacing (as the default name already includes this), so no namespacing attribute is necessary.

{% if ... %} - this conditional block will only resolve for development branches (i.e. not main or qa). This means that for any other branch (e.g. dev, feature branches) the default database will be created as a clone of the production database (identified by the pre-computed DATAOPS_DATABASE_MASTER variable). For main and qa branches, no database will be cloned and just the content specified in this file will be added.

When defining databases (and schemas), remember to include USAGE grants for your roles.

Exercise: Run a pipeline and set up Snowflake

For this exercise, you will need your clean DataOps project, freshly created from the standard template. You will also need a runner available (usually this will be defined on the parent group of the project).

tip

If you encounter pipeline errors, and the solution is not immediately obvious, you can re-run the pipeline with the variable DATAOPS_DEBUG set to 1. Doing so will output additional debug information into the job logs. For additional confidentiality, all secret values will be masked in the log output.

Open the Web IDE on the main branch.
Open file pipelines/includes/config/agent_tag.yml and set your runner name.
Open file pipelines/includes/config/variables.yml and set the following variables:
- DATAOPS_PREFIX - set this to DATAOPS_SOLE_TRAINING
- DATAOPS_VAULT_KEY - generate a string of suitably randomized characters for this
- Secrets manager setup
Open file full-ci.yml and comment out the Modelling and transformation jobs and Generate modelling and transformation documentation sections (we don't need these for this exercise).
Commit to main and run pipeline full-ci.yml.

info
The first pipeline to run in any new DataOps project must be on the main branch. This sets up the production environment, from which other environments will be cloned.
After the pipeline has run, take a look at the Snowflake setup. You should be able to see:
- Database DATAOPS_SOLE_TRAINING_PROD with the expected grants
- Two new warehouses
- The new users and roles (you may need to switch to ACCOUNTADMIN)

Base Configuration

Unpacking the SOLE configuration​

Warehouses​

Roles​

Users​

Default database​

Exercise: Run a pipeline and set up Snowflake​