Skip to main content

Key Concepts

Now that we've covered the need for an engine like SOLE, and looked at some of the main use cases for Snowflake object lifecycle management in DataOps, this section will start to examine the main concepts within SOLE.

Declarative Configuration

As we established back in Why SOLE?, this is a declarative approach to object management and, as such, needs a declarative configuration.

SOLE uses a YAML-based configuration format that comprises a number of files, each specifying the objects of a certain type (e.g. databases, warehouses, roles, users, etc.). These files sit together in the DataOps project, located in a specific directory (dataops/snowflake).

Configuration files can include Jinja templating that will allow inclusion of dynamic content at pipeline run time. This can include:

  • Variables from the pipeline environment and the DataOps Vault
  • Conditional statements to include parts of the configuration under certain conditions
  • Control structures that can loop through a set of values, reducing the code footprint.

If configuration files include templated content, the filename will include .template., for example databases.template.yml.

SOLE Architecture

The DataOps Snowflake Object Lifecycle Engine is built on top of the popular infrastructure-as-code toolset Terraform, using the Chan-Zuckerberg Snowflake provider, along with a YAML-based configuration compiler, enhanced logging and reporting, and many other features.

When a SOLE job runs, configuration flows through the following system components in this order:

img alt

Template Renderer

A core part of the DataOps platform, this component identifies any files that contain Jinja template content and renders them to static files. This operation will resolve any variables to their current values, optionally include any conditional content, and expand out any looped sections.

Each rendered file is saved under the original filename but without the .template modifier. The original template file is then deleted to avoid issues with the next stages.

SOLE Compiler

Next, the rendered YAML files go through a compiler that converts the syntax from the SOLE YAML format into Terraform's JSON structures, adding computed properties and other settings as required by the object type.

This is actually done in four sections, SOLE's domains of operation:

  1. Account
  2. Database
  3. Database objects (schemas, tables, etc.)
  4. Grants

It is possible to export the Terraform configurations from a SOLE job if needed, using a DataOps artefact.

Terraform

Once the Terraform configurations are ready, they are applied to Snowflake in order. In actual fact, Terraform has it's own compile, plan and apply stages, but that is beyond the scope of this guide.

As part of this stage of the process, Snowflake configuration is imported, in order to compute the changes needed. And if the SOLE configuration includes a database clone (such as in a feature branch), there is an additional import from the newly-cloned database to detect its contents.

Snowflake

Ultimately, the SOLE configuration is fully applied to the Snowflake account, with only the minimum changes necessary being made to update existing objects. The job will summarise the number of additions, changes and deletions that were applied to each domain of operation, e.g.:

|   Resource type    |       Action       |   Objects Added    |  Objects Modified  |  Objects Deleted   |
----------------------------------------------------------------------------------------------------------
| ACCOUNT_LEVEL | PLAN | 5 | 1 | 0 |
| ACCOUNT_LEVEL | APPLY | 5 | 1 | 0 |
| DATABASE | PLAN | 0 | 0 | 0 |
| DATABASE | APPLY | 0 | 0 | 0 |
| DATABASE_LEVEL | PLAN | 0 | 0 | 0 |
| DATABASE_LEVEL | APPLY | 0 | 0 | 0 |
| GRANT | PLAN | 0 | 8 | 3 |
| GRANT | APPLY | 0 | 8 | 3 |

Defining Objects

Objects are defined in the YAML configuration files as mentioned above. Here is a simple example of a warehouse object in SOLE:

dataops/snowflake/warehouses.yml
warehouses:
TRANSFORMATION:
comment: Warehouse for Transformation operations
warehouse_size: MEDIUM
auto_suspend: 60
auto_resume: true
grants:
USAGE:
- WRITER
- READER

The above configuration will create and maintain a medium warehouse in Snowflake, with the specified suspend/resume settings and access to be used by two roles.

Namespacing

In the example configuration above, the warehouse name is specified as TRANSFORMATION, but that's not the full name that will end up in Snowflake, as SOLE can apply namespacing prefix and suffix to all account-level objects.

Namespacing is important so that objects can coexist across different DataOps projects and environments. SOLE will apply a prefix, a suffix, or both to object names to achieve this.

Project Prefix

All DataOps projects define a variable named DATAOPS_PREFIX, set in pipelines/includes/config/variables.yml, that defines the main identifier for the project. The default value is usually DATAOPS, but this can (and often is) changed in projects.

This prefix can allow multiple projects to coexist within the same Snowflake account, or will at least identify account-level objects as being managed by DataOps.

Environment Suffix

When a pipeline is run, the environment is determined from the current branch - master branch is the PROD environment, dev is DEV, etc. Feature branches (e.g. any branch that is not master, qa or dev) will have a generated environment name of the form FB_BRANCHNAME.

This suffix allows each environment to have its own set of Snowflake objects, without the risk of name collisions.

Resultant Naming

So, from the warehouse example above, if our project's DATAOPS_PREFIX is set to DATAOPS and we're running a pipeline in the dev branch, the warehouse will be named DATAOPS_TRANSFORMATION_DEV.

If we merge our dev code into master and run the pipeline again, a new warehouse will be managed, named DATAOPS_TRANSFORMATION_PROD.

Notes on Namespacing

When referring to another object in an object's configuration, for example, the roles READER and WRITER in the code above, always use the base, non-namespaced name. The SOLE compiler will sort out all the resultant names for you. The exception here is when referring to an externally-managed object - see below for more on that.

It is possible to disable namespacing, either to limit to prefix- or suffix-only, or to turn it off altogether. This will be covered later.

Use with Caution!

Disabling namespacing can have an impact on your production environment, and even cause conflicts between different projects. It's there for a reason, so please ensure you understand it fully!

Object Deletion

Setting namespacing to none or prefix prevents the deletion of the object. Please refer to Environment-Specific Snowflake Objects.

note

To manage the lifecycle of the object but not its prefix and suffix, namespacing can be set to external.

Roles and Grants

As the main use case for SOLE, the management of roles and grants is core functionality and should feature in all DataOps projects. The standard SOLE configuration has a file roles.yml that can be extended to include all the roles your project needs to define, along with how those roles are assigned to each other and to users.

Then, each object has a grants block (see the example above), that then specifies which roles can enjoy which privileges on that object.

Externally-Managed Objects

Of course, DataOps will not manage every single object in your Snowflake account. Even if it's doing most things, you will still have several other objects, particularly the built-in roles, user accounts, sample data database, etc.

Fortunately, SOLE has the concept of manage_mode to define how to interact with objects outside its control. For most objects configured in SOLE, the default manage_mode is set to all, allowing SOLE full control over the object, but it is also possible to declare objects as externally managed.

For example, I may want to grant MY_ROLE to the built-in SYSADMIN role to maintain the role hierarchy:

dataops/snowflake/roles.yml
roles:
MY_ROLE:
comment: main role for my project
roles:
- SYSADMIN

SYSADMIN:
manage_mode: none
namespacing: none

Read more on manage_mode here.