Base Configuration
The base SOLE configuration that ships with the standard DataOps template contains definitions for the basic warehouses, roles and users, and sets up the default DataOps database.
In this section, we will examine each of these configurations, which can be found in the dataops/snowflake
directory
of any DataOps project.
Unpacking the SOLE configuration
Warehouses
For complete details of warehouse configuration, see the SOLE Reference Guide — Warehouse.
The DataOps template defines two warehouses, INGESTION
and TRANSFORMATION
. Here is an extract from warehouses.yml:
warehouses:
INGESTION:
comment: Warehouse for Ingestion operations
warehouse_size: MEDIUM
max_cluster_count: 2
min_cluster_count: 1
scaling_policy: ECONOMY
auto_suspend: 40
auto_resume: true
namespacing: prefix
grants:
USAGE:
- WRITER
- ADMIN
MONITOR:
- ADMIN
OPERATE:
- ADMIN
...
It's worth noting from this configuration that the configured name of the warehouse is INGESTION
, but the namespacing
attribute is set to prefix
, so the full name in Snowflake will include the DATAOPS_PREFIX at the start:
DATAOPS_INGESTION
by default. However, as namespacing is prefix-only, this warehouse will not have an
environment-specific suffix, so will therefore be shared across all environments.
Roles
Out of the box, DataOps defines three roles: ADMIN
, WRITER
and READER
. Here's an example of the three roles you can specify in the roles.yml
file. For complete details of role configuration, see SOLE Reference Guide — Role.
roles:
READER:
namespacing: prefix
roles:
- WRITER
WRITER:
namespacing: prefix
users:
- MASTER
- INGESTION
- TRANSFORMATION
roles:
- ADMIN
ADMIN:
namespacing: prefix
users:
- MASTER
roles:
- SYSADMIN
The READER
role is granted to the WRITER
, which in turn is granted to the ADMIN
role, maintaining a role hierarchy. This hierarchy is completed
by granting the ADMIN
role to the externally-defined SYSADMIN
role.
If your Snowflake security configuration requires a parent role other than SYSADMIN
for integrations such as
DataOps, substitute that role for SYSADMIN
in the role grants for DataOps ADMIN
.
Note that similar to the warehouses, the namespacing for these roles is also prefix
, so they too will
be shared across environments.
Users
Here's an example of possible properties you can define in the users.yml
file. For complete user configuration details, see SOLE Reference Guide — User.
users:
JOHNDOE:
namespacing: none
login_name: "JOHNDOE"
disabled: false
display_name: "JOHNDOE"
first_name: "John"
last_name: "Doe"
email: "john.doe@dataops.live"
must_change_password: false
rsa_public_key: "example_public_key"
default_warehouse: COMPUTE_WH
default_role: ROLE_FOR_JOHN_DOE
comment: User Login for DataOps using Key Pair Auth
The users defined in the users.yml
file are intended to support legacy features — DataOps can run on the single, out-of-the-box SOLE admin user created as part of the DataOps setup.
Any users created via SOLE must have their password set manually following their creation.
Default database
Out of the four standard SOLE configuration files, the default database file has templated content.
databases:
"{{ env.DATAOPS_DATABASE }}":
{# For non-production branches, this will be a clone of production #}
{% if (env.DATAOPS_ENV_NAME != 'PROD' and env.DATAOPS_ENV_NAME != 'QA') %}
from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
{% endif %}
comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}
grants:
USAGE:
- WRITER
- READER
...
"{{ env.DATAOPS_DATABASE }}"
- this sets the name of the default database from an automatic variable generated
by DataOps internally every time the pipeline runs. By default, the database name comprises the project prefix
(DATAOPS_PREFIX
) followed by the environment name, so in production, the database will be named DATAOPS_PROD
.
Also, SOLE detects this is the default database and does not attempt any namespacing (as the default name already
includes this), so no namespacing
attribute is necessary.
{% if ... %}
- this conditional block will only resolve for development branches (i.e. not main or qa).
This means that for any other branch (e.g. dev, feature branches) the default database will be created as a
clone of the production database (identified by the pre-computed DATAOPS_DATABASE_MASTER
variable).
For main and qa branches, no database will be cloned and just the content specified in this file will be added.
When defining databases (and schemas), remember to include USAGE grants for your roles.
Exercise: Run a pipeline and set up Snowflake
For this exercise, you will need your clean DataOps project, freshly created from the standard template. You will also need a runner available (usually this will be defined on the parent group of the project).
If you encounter pipeline errors, and the solution is not immediately obvious, you can re-run the pipeline with the
variable DATAOPS_DEBUG
set to 1
. Doing so will output additional debug information into the job logs.
For additional confidentiality, all secret values will be masked in the log output.
-
Open the Web IDE on the main branch.
-
Open file
pipelines/includes/config/agent_tag.yml
and set your runner name. -
Open file
pipelines/includes/config/variables.yml
and set the following variables:DATAOPS_PREFIX
- set this toDATAOPS_SOLE_TRAINING
DATAOPS_VAULT_KEY
- generate a string of suitably randomized characters for this- Secrets manager setup
-
Open file
full-ci.yml
and comment out the Modelling and transformation jobs and Generate modelling and transformation documentation sections (we don't need these for this exercise). -
Commit to main and run pipeline
full-ci.yml
.infoThe first pipeline to run in any new DataOps project must be on the main branch. This sets up the production environment, from which other environments will be cloned.
-
After the pipeline has run, take a look at the Snowflake setup. You should be able to see:
- Database
DATAOPS_SOLE_TRAINING_PROD
with the expected grants - Two new warehouses
- The new users and roles (you may need to switch to
ACCOUNTADMIN
)
- Database