Skip to main content

DataOps Snowflake Object Lifecycle Engine

The Declarative way to drive your entire Snowflake estate

The Snowflake Object Lifecycle Engine (SOLE) manages the life-cycle of Snowflake Objects defined in a configuration specified by the user. The full list of supported objects is at Supported Snowflake Objects.

The User can define the configuration of Snowflake objects in a YAML file, which is a human-readable format. The engine compiles the configuration to work out dependencies and resolves the environment and namespace to effectively manage the specified objects.

This operation can be divided into 2 distinct actions:

  • Compilation
  • Execution

The orchestration of the actions is controlled by the Snowflake Object Lifecycle Orchestrator. The orchestrator performs the specified action on the specified resource group.

Compilation

This action handles reading the user-specified configuration describing their Snowflake Infrastructure and generates a machine-readable configuration that can be executed to enact those specifications.

This means collecting all configuration files, namespacing objects as per environment, and resolving dependencies within objects. Upon successful compilation, resource definition and import files are generated and saved in resource group specific folders.

Execution steps can now be triggered to manage the Snowflake Objects.

Execution

Inspecting Snowflake to get the state of Objects, and applying changes if required is categorized as the Execution phase of the Snowflake Object Lifecycle Engine.

Inspection collects the current state of specified Snowflake Object and saves it into local state. This local state is then used to plan out what changes are required to get the Object to match the user configuration.

Upon applying, the Snowflake Object Lifecycle Engine enacts the plan in Snowflake Infrastructure.

The following actions are part of Execution phase:

Resource Groups

The Snowflake Object Lifecycle Engine divides Snowflake Objects into 4 resource groups. The execute action is performed on each resource group in sequential order.

The resource groups are:

ACCOUNT_LEVEL

This resource group refers to all Account-Level Snowflake Objects supported by Snowflake Object Lifecycle Engine. These objects are:

The generated resource definition and import statement files are stored in a directory named account_level inside the cache directory. This resource group should run at the initial stages of a pipeline in parallel with databases.

Setting the parameter LIFECYCLE_MANAGE_OBJECT to ACCOUNT_LEVEL commands the engine to execute actions such as PLAN, APPLY etc. for Account-Level objects.

DATABASE

This resource group is used just for the management of databases.

The generated resource definition and import statement files are stored in a directory named database inside the cache directory. This resource group should run at the initial stages of a pipeline in parallel with account_level.

Setting the parameter LIFECYCLE_MANAGE_OBJECT to DATABASE commands the engine to execute actions such as PLAN, APPLY etc. for Databases.

DATABASE_LEVEL

This resource group refers to all Database-Level Snowflake Objects supported by Snowflake Object Lifecycle Engine. These objects are:

The generated resource definition and import statement files are stored in a directory named database_level inside the cache directory. This resource group should run after the execution of APPLY for both databases and account_level objects. This ensures that all referenced Account-Level objects are already created and Database-Level objects created indirectly (as a result of a cloned database) are managed effectively.

Setting the parameter LIFECYCLE_MANAGE_OBJECT to DATABASE_LEVEL commands the engine to execute actions such as PLAN, APPLY etc. for Database-Level objects.

GRANT

This resource group refers to all Privileges Grants(and Revokes) supported by Snowflake Object Lifecycle Engine. Privilege Grant is supported for following objects:

The generated resource definition and import statement files are stored in a directory named grants inside the cache directory. This resource group should run after execution of APPLY for account-level, database and database-level.

Setting the parameter LIFECYCLE_MANAGE_OBJECT to GRANT commands the engine to execute actions such as PLAN, APPLY etc. for Grants.

Supported Parameters

ParameterRequired or Optional (and default)Description
LIFECYCLE_ACTIONREQUIREDMust be one of COMPILE, VALIDATE, PLAN, APPLY, PLAN-DESTROY, DESTROY, AGGREGATE, AGGREGATE-DESTROY
DATAOPS_SOLE_ACCOUNTREQUIREDSnowflake Account (If region-less or organization name is not used, the region should be added as well in this variable).
DATAOPS_SOLE_USERNAMEREQUIREDSnowflake Username
DATAOPS_SOLE_PASSWORDREQUIREDSnowflake Password
DATAOPS_SOLE_ROLEREQUIREDSnowflake Role with which to run the queries
DATAOPS_SOLE_WAREHOUSEOPTIONALSnowflake Warehouse used for SOLE Grant Management
DATAOPS_DATABASEREQUIREDName of the default database. Required if default database is specified in configuration in format {{env.DATAOPS_DATABASE}}
DATAOPS_PREFIXREQUIREDPrefix to be added to Account-Level and Databases(Except default database)
DATAOPS_ENV_NAMEREQUIREDSuffix to be added to Account-Level and Databases(Except default database)
CONFIGURATION_DIROPTIONALPath of a directory in which the configuration files are present. required if LIFECYCLE_ACTION is set to COMPILE
DATAOPS_NONDB_ENV_NAMEOPTIONALSuffix override value. If specified, it overrides all branch-specific suffixes with the specified value
DATAOPS_DATABASE_MASTEROPTIONALName of the PROD or master database. Required if the production database is referenced in the configuration via {{env.DATAOPS_DATABASE_MASTER}}
LIFECYCLE_MANAGE_OBJECTOPTIONALResource Group to be managed. Required if LIFECYCLE_ACTION is one of PLAN, APPLY, PLAN-DESTROY, DESTROY. Must be one of ACCOUNT_LEVEL, DATABASE, DATABASE_LEVEL, GRANT
LIFECYCLE_RESOURCE_FILEOPTIONAL - Defaults to resources.tf.jsonName of the file which contains resource definitions
LIFECYCLE_IMPORT_FILEOPTIONAL - Defaults to resource_imports.jsonName of the file which contains resource import statements
PROVIDER_DIROPTIONAL - Defaults to /snowflake-providerDirectory path of provider configuration. Must contain a file named provider.tf in the directory which has the list of providers
DISABLE_PERSISTENT_CACHEOPTIONALIf set, disables persistent cache usage by the orchestrator. The generated files transfer between jobs must be handled by the user in such case
ARTIFACT_DIRECTORYOPTIONAL - Defaults to CI_PROJECT_DIR/snowflake-provider, where CI_PROJECT_DIR is the path of repositoryPath to where the artifacts such as resource file, import file and state files are uploaded to. The directory must be set to upload as an artifact in the DataOps job as well
LIFECYCLE_STATE_RESETOPTIONALIf set, removed the local state file and re-initializes all defined and managed objects
DONT_USE_SOLE_GRANT_MANAGEMENTOPTIONALIf set, disables SOLE Grant Management for managing the privilege/role grant and revokes and uses the underlying Terraform Provider
DATAOPS_SOLE_OVERRIDEOPTIONALIf set, overrides the object delete/update prevention
DATAOPS_SOLE_PROD_BRANCHOPTIONAL - Defaults to masterSpecifies the branch used as PROD environment
DATAOPS_DEBUGOPTIONALIf set, enables debug logs for the Engine. Note that this will print potentially sensitive information, such as credentials, in the job logs
DATAOPS_SOLE_DEBUGOPTIONALIf set, enables debug logs for SOLE without exposing credentials in the logs
SOLE_LOG_CLEANUPOPTIONALIf set, removes all log files from cache present on host system
SOLE_CACHE_CLEANUPOPTIONALIf set, removes all cache present on host system
DATAOPS_PRESERVE_OBJECT_NAMEOPTIONALIf set, enables special character support. Refer to Special Character Support before use
note

Setting DATAOPS_SOLE_DEBUG to a value enables debug mode for SOLE. This generates more log output related to its internal process. This is useful for debugging if the job is behaving unexpectedly or is failing.

No credentials are logged when using DATAOPS_SOLE_DEBUG (credentials used for SOLE are masked by fixed 16 length character X)

Configuration

The Snowflake Object Lifecycle Engine supports defining multiple files to read for Snowflake Object configuration. Users can specify their Snowflake configuration in YAML as a human-readable, declarative format.

The YAML configuration files have the following reserved keywords under which the Snowflake Object definitions must be defined:

KeywordDescriptionType
accountThis keyword is defining the configuration of Account.Map/Dictionary
databasesThis keyword is defining the configuration of Databases.Map/Dictionary
resource_monitorsThis keyword is defining the configuration of Resource Monitors.Map/Dictionary
rolesThis keyword is defining the configuration of Roles.Map/Dictionary
sharesThis keyword is defining the configuration of Shares.Map/Dictionary
usersThis keyword is defining the configuration of Users.Map/Dictionary
warehousesThis keyword is defining the configuration of Warehouses.Map/Dictionary
includeThis keyword is used to define a list of files to be included by the engineList
specified_rolesThis keyword is used to define a list of Roles on which SOLE Grant Management can act onList
specified_usersThis keyword is used to define a list of Users on which SOLE Grant Management can act onList

Snowflake Object definition files are stored in the directory specified by the CONFIGURATION_DIR parameter. The directory can contain one or more files with either the .yml or .yaml file extension. You can specify all objects in a single file. For better maintainability manage them in separate files logically grouping objects, e.g. all roles in one file, all users in another. The engine merges the object definitions from all the files into a single declarative configuration.

dataops/snowflake/objects.yml
databases:
DATAOPS_DATABASE:
schemas:
STAGING: ...

warehouses:
INGESTION: ...
Multiple Definitions of Snowflake objects

If multiple files contain definitions of the same Snowflake Object, then the engine only uses the configuration from the last file. Said differently, the definition is overridden.

Similar to DataOps pipeline files, the engine supports including other configuration files from different directories by using the keyword include and passing a list of files in it.

snowflake.yml
include:
- "/snowflake/ingest/database.yml"
- "/snowflake/sharing/database.yml"
Case-Insensitive object names

Snowflake Object names specified in the configuration are case-insensitive. warehouse_1 is converted to WAREHOUSE_1. If such objects are referenced as parameters in another object, the name has to be present in upper-case. Otherwise, the SOLE compiler would assume that as a non-managed object.

Special Character Support

By default, SOLE enforces Snowflake identifier rules on object names, meaning:

  • Object name is converted to uppercase
  • Object name can not start with a number
  • Only alphanumeric(A-Z and 0-9) characters and underscore(_) are supported in the object name

Special character and case-sensitive identifier support can be enabled in SOLE with the parameter DATAOPS_PRESERVE_OBJECT_NAME.

note

SOLE currently doesn't support below special characters as object identifiers, including table columns,

  • "
  • '
  • \
  • |
info

Snowflake currently doesn't support special characters as an identifier for the following objects:

  • Function
  • External Function
  • Procedure

When this parameter is set, object names defined in the SOLE config are used as is without any modification. If an object name is configured in lower-case, then the case is preserved. This means that object names object_1, OBJECT_1 and Object_1 would create three different objects and have to be referenced and used as such.

tip

Snowflake converts object name to uppercase by default. To use case-sensitive object names (as well as special character), the names have to be enclosed in double quotes (").

SELECT * FROM "database_1"."Schema_2"."Table-3";
caution

If relying on SOLE functionality of converting object name to uppercase (database_1 -> DATABASE_1), enabling this flag disables that and will instruct SOLE that DATABASE_1 needed to be deleted and database_1 to be added.

This functionality also disables object identifier rule verification performed in SOLE. If an unsupported identifier is used for an object, the APPLY job for the job would fail due to an error in Snowflake.

caution

Using this functionality leads to the creation of objects with special characters and case-sensitive identifiers. Such objects might not work out of the box with all orchestrators and all queries.

Access Management

Users can define privilege grants to roles in the configuration of the Snowflake Object. Using this the engine handles the generation of all grants and revoked on the object.

As the Snowflake Object Lifecycle Engine follows a declarative method for defining the Snowflake Objects, the engine updates the objects to match the configuration. This means that only the specified privileges are granted to specified roles. Revokes are added implicitly for all unspecified privileges.

tip

The implicit revoke only initiates when the keyword grants is used in the configuration of the object. If the keyword is not specified, then the grants and revokes of privileges are not managed.

In the below example, the MODIFY privilege to role ROLE_1 is specified in the grants section in the configuration of DATABASE_1, while no grants section is defined for database_2.

dataops/snowflake/database.yml
DATABASE_1:
comment: "DATABASE 1"
grants:
MODIFY:
- ROLE_1
DATABASE_2:
comment: "DATABASE 2"

Using this configuration, the engine would generate revokes on privileges MONITOR, USAGE, CREATE SCHEMA, IMPORTED PRIVILEGES for the database DATABASE_1. As no grants section is defined in the configuration of DATABASE_2, no revokes for this database are performed.

Revoking all Privileges

To manage revokes on all the privileges of a Snowflake Object, an empty grants can be specified in the configuration of the said Object.

dataops/snowflake/database.yml
DATABASE_3:
comment: "DATABASE 3"
grants:

Implicit Grant of USAGE Privilege on Parent Objects

If a privilege is granted to a role on a Database Object (Schema, Table, View, etc.), but USAGE is not granted on parent objects, then the role would not be able to access the Snowflake Object. In such case scenarios, USAGE on the parent would have to be granted to roles as well.

To reduce the number of configurations to be managed, the engine can implicitly grant USAGE on parent (and grandparent) objects to such roles.

dataops/snowflake/database.yml
DATABASE_5:
comment: "Database 5"
grants:
MODIFY:
- ROLE_1
MONITOR:
- ROLE_1
- ROLE_2
schemas:
SCHEMA_1:
comment: Hello
grants:
MODIFY:
- ROLE_1
tables:
TABLE_1:
grants:
SELECT:
- ROLE_1
- ROLE_2
- SYSADMIN
INSERT:
- ROLE_2
comment: Hello_1
COLUMNS:
COLUMN_1:
type: VARCHAR(16777216)

In the above example, on the table TABLE_1, SELECT is granted to roles ROLE_1, ROLE_2 and SYSADMIN, and INSERT is granted to ROLE_2. As USAGE is not granted to any of the roles in both the schema SCHEMA_1 and the database DATABASE_5, the roles would not be able to access the table TABLE_1.

In this case, the engine would implicitly grant USAGE to roles ROLE_1, ROLE_2 and SYSADMIN on both SCHEMA_1 and DATABASE_5.

Manage Mode

The Snowflake Object Lifecycle Engine supports defining the manage_mode of the specified object. Configuring manage mode allows users to specify what properties of the objects are to be managed by the engine and what is to be ignored.

This is essential if only the grants of an object are to be managed, or a child object (Table, Pipe) is managed but its parent (Schema) is not.

Example:

dataops/snowflake/database.yml
databases:
DATABASE_1:
manage_mode: all
comment: "Test Database 1"
grants:
USAGE:
- ROLE_1
- SYSADMIN
DATABASE_2:
manage_mode: grants
comment: "Test Database 2"
grants:
USAGE:
- ROLE_1
- SYSADMIN
DATABASE_3:
comment: "Test Database 3"
manage_mode: none
schemas:
SCHEMA_1:
comment: "Test Schema"

With this configuration,

  • for DATABASE_1, all specified properties would be managed. That means both the comment and defined grants.
  • for DATABASE_2, only the grants would be managed. That means only defined grants. Creation and update to match database properties like the comment are ignored.
  • for DATABASE_3, none of the parameters would be managed. That means neither the comment nor defined grants.
  • for schema SCHEMA_1 defined inside the database, the default manage_mode all applies and the comment property is managed.

The engine supports the following manage modes:

All

When all the parameters of the object have to be managed, the setting manage_mode to all would inform the engine that the resource, as well as grant definitions, have to be generated for the object.

As all is the default value for manage_mode, it does not have to be explicitly defined.

Grants

If only the defined grants have to be managed for the object, then setting manage_mode to grants would inform the engine that only grants definitions or revokes thereof have to be generated for the object.

None

For database-level objects, such as Schema, Table, Stage, etc., their parent object has to be defined in the configuration. Schemas have to be defined inside a database. Tables, Stages, Masking Policies, etc. have to be defined inside a schema.

In case only a child object has to be managed, setting manage_mode to none for the parent object would inform the engine that neither the resource nor the grant definition has to be generated for it. The definitions would only be generated for child objects.

info

The manage_mode has no option to only manage the parameters of the objects and ignore the grants, as this can be achieved by not defining the grants in the configuration. See Access Management section for more information.

State Management

Import, Creation and Update save the current state of the Snowflake objects into the local state. This local state is saved in the persistent cache of the DataOps runner's host system.

This allows the engine to use the existing state of the objects for subsequent pipelines.

Each environment and branch has its own set of states. Furthermore, each resource group has its version of the state.

The state file is present on the host system at the following path: /<cache_directory>/persistent_cache/<dataops_project_name>/<branch_name>/snowflakelifecycle-runner/<resource_group>/<resource_group>.tfstate

The value in <> refers to:

  • cache_directory: Path in the host system where the cache of the runner is stored
    • the default is /agent_cache.
    • To get the value for your DataOps runner see the volume mounts in the /srv/<agent_name>/config/config.toml
    • Refer to the initial setup of the DataOps runner
  • dataops_project_name: Name of the project in lower-case
  • branch_name: Name of the branch in lower-case
  • resource_group: Name of the resource group
note

The state is not maintained for Grants.

State Reset

If the state file is corrupted or does not match Snowflake, failures can occur in pipelines.

In such scenarios, the local state file can be reset to perform a fresh inspection of Snowflake. Doing so deletes the existing local state for the specified resource group and imports all managed Snowflake Objects again.

The reset can be triggered by the variable LIFECYCLE_STATE_RESET.

The state reset can be configured either at the project level in pipelines/includes/config/variables.yml to reset the state of all resource groups. Or it can be configured at an individual resource group level as a variable to the Plan Job for the resource.

info

Individual resource group state reset is not supported in Aggregate Jobs

In-Place Changes

Snowflake uses aliases for data types such as TEXT for VARCHAR(16777216). Using such aliases in the configuration leads to logging of in-place changes by the orchestrator. To avoid such in-place changes, the exact data types should be used in the configuration as well.

For Resource Monitor, Snowflake Docs suggest that TIMESTAMP can be used for parameter start_timestamp. But using a timestamp leads to an error when creating or updating a resource monitor. Currently, a resource monitor only supports DATE as a valid value at the time of creation, but upon successful operation, it is converted to timestamp. This leads to the engine marking the resource monitor to have an in-place change to replace TIMESTAMP with DATE again.

Namespace and Environment Management

The Snowflake Object Lifecycle Engine allows control over namespacing and environment for all account-level objects. Namespacing separates objects in different branches (for example, QA, dev, and feature branches) from each other so that each branch can work on its objects.

Environment Management controls the environments in which objects are managed. This feature ensures that objects created in a specific environment, for example, Prod or QA, are only accessed and managed by the environment that created them.

Namespace Management controls the addition of prefixes and suffixes to account-Level objects and databases, except the default database.

Environment Management

Environment Management control can be applied to Account-level objects. This allows configuring under which environment an object is managed by the engine.

This can be set by adding environment in the properties section of an account-level object and setting its value to one of the environment names, typically one of PROD, QA, or DEV.

Example:

dataops/snowflake/objects.yml
roles:
ADMIN:
environment: PROD
QA:
environment: QA
databases:
DB:
environment: DEV

With this configuration,

  • Role ADMIN, is only managed when the environment is PROD (master branch),
  • Role QA, is only managed when the environment is QA (qa branch),
  • Database DB, is only managed when the environment is DEV (dev branch)

If the environment parameter is not specified, the object is created in all environments.

danger

Setting the environment: property on an object that already exists in the configuration will cause that object to be removed from other environments EVEN IF the namespacing is set to prevent deletion. For example, setting environment: PROD will cause the object to be removed in feature branches, QA and dev branches. To avoid this, set the LIFECYCLE_STATE_RESET variable for the first pipeline execution after setting the environment: property. This clears the SOLE state file, and so SOLE will no longer remove the object. This applies to any branch that the change to environment: is merged into - for example, if changes from the dev or master branch are merged into a feature branch, then LIFECYCLE_STATE_RESET should be set on at least the next pipeline run on that feature branch.

Namespace Management

Specifying namespacing controls whether a prefix and/or a suffix is added to Account-Level objects and Databases (except for the default database).

Set namespacing in the properties section of an account-level object to one of:

  • both (default)
    • Both the prefix and the suffix are added to the name of a Snowflake object
    • If the namespacing property is not defined, this is the default
  • none
    • Neither the prefix nor the suffix is added to the name of the Snowflake object
      Object Deletion

      Setting namespacing to none prevents deletion of an object. Refer to Environment specific Snowflake Objects why.

  • prefix
  • suffix
    • Only the suffix is added to the name of a Snowflake object
    • If the parameter DATAOPS_NONDB_ENV_NAME is set, its value is used as the suffix instead
  • external
    • Neither the prefix nor the suffix is added to the name of a Snowflake object
    • The object is managed completely (creation, update, and deletion), but no namespacing is applied
    • external can be used if the object names are being managed outside the engine, for example, by a script running before SOLE, to set the names of objects to follow a predefined naming convention.
    • Setting external as namespacing allows for complete lifecycle management without adding the suffix or the prefix.

Environment specific Snowflake Objects

Adding a suffix to a name makes the objects scoped to the environment. This notifies SOLE that the object lifecycle (creation, update and deletion) can be completely managed.

If a suffix is not set via namespacing, then SOLE assumes the object to be an object shared between environments. As deleting a shared object could have repercussions, SOLE does not perform deletion of such objects.

Example
dataops/snowflake/objects.yml
roles:
ADMIN:
namespacing: none
QA:
namespacing: suffix
databases:
DB_1:
namespacing: both
DB_2:
namespacing: prefix

With DATAOPS_PREFIX set as DATAOPS and DATAOPS_ENV_NAME set as PROD, the generated configuration is,

  • Role ADMIN, remains as it is: ADMIN
  • Role QA, is converted to QA_PROD,
  • Database DB_1, is converted to DATAOPS_DB_1_PROD,
  • Database DB_2, is converted to DATAOPS_DB_2

Database-Level Object Reference

As the engine operates on multiple databases and schemas as part of Lifecycle Management, it doesn't set a database or schema as a session when connecting to Snowflake. Due to this, referencing Database-Level objects requires specifying the complete path/namespace to the object.

If the referenced Database is managed and environment controlled, the environment-based namespace is resolved by the engine when referencing the object.

Such Database-Level references could be provided in one of three ways:

  • by Object name
  • by Object and Schema name
  • by Object, Schema, and Database name

Object Reference by Object Name

This can be used when referenced object belongs in the same schema and database.

databases:
DB_1:
schemas:
SCHEMA_1:
pipes:
PIPE_1:
copy_statement:
into:
table: "TABLE_1"
from:
stage: "STAGE_1"

If Prefix is set to DATAOPS and Suffix is set to PROD:

  • The fully qualified name for the table would be DATAOPS_DB_1_PROD.SCHEMA_1.TABLE_1
  • The fully qualified name for the stage would be DATAOPS_DB_1_PROD.SCHEMA_1.STAGE_1

Object Reference by Object and Schema name

This can be used when a referenced object belongs in a different schema but is part of the same database.

databases:
DB_1:
schemas:
SCHEMA_1:
pipes:
PIPE_1:
copy_statement:
into:
schema: SCHEMA_2
table: "TABLE_1"
from:
schema: SCHEMA_3
stage: STAGE_1

If Prefix is set to DATAOPS and Suffix is set to PROD:

  • The fully qualified name for the table would be DATAOPS_DB_1_PROD.SCHEMA_2.TABLE_1
  • The fully qualified name for the stage would be DATAOPS_DB_1_PROD.SCHEMA_3.STAGE_1

Object Reference by Object, Schema, and Database name

This can be used when a referenced object belongs to a different database.

databases:
DB_1:
schemas:
SCHEMA_1:
pipes:
PIPE_1:
copy_statement:
into:
database: DB_2
schema: SCHEMA_2
table: "TABLE_1"
from:
database: DB_3
schema: SCHEMA_3
stage: STAGE_1

Assuming that DB_2 is managed while DB_3 is not; if Prefix is set to DATAOPS and Suffix is set to PROD:

  • The fully qualified name for the table would be DATAOPS_DB_2_PROD.SCHEMA_2.TABLE_1
  • The fully qualified name for the stage would be DB_3.SCHEMA_3.STAGE_1

Even if the referenced object belongs to the same database and schema, explicit names can be provided as well.

Connection Parameters

To establish a connection to a Snowflake account, SOLE requires the following parameters:

Account

This is the name of the account in which SOLE operates.

This value is set in the variable DATAOPS_SOLE_ACCOUNT. If the value is not present in the variables, SOLE tries to obtain the value from the DataOps vault at path SNOWFLAKE.SOLE.ACCOUNT.

The value of the variable can also be set using further DataOps Vault functionality if users do not wish to expose credentials in configuration.

The full account name along with the region, e.g. eu-west-2 has to be defined in DATAOPS_SOLE_ACCOUNT. Organizational short names and region-less account names can be used in DATAOPS_SOLE_ACCOUNT as well.

To get the region less account name for your Snowflake account the following query can be executed:

SELECT system$whitelist();

The query returns a JSON result with a key SNOWFLAKE_DEPLOYMENT_REGIONLESS with its value formatted as abcdef-qwerty.snowflakecomputing.com. The complete abcdef-qwerty part can be specified as the DATAOPS_SOLE_ACCOUNT.

Username

This is the username with which SOLE connects to the account specified in DATAOPS_SOLE_ACCOUNT.

This value is set in the variable DATAOPS_SOLE_USERNAME. If the value is not present in the variables, SOLE tries to set the value from the DataOps vault path SNOWFLAKE.SOLE.USERNAME.

The value of the variable can also be set using further DataOps Vault functionality if users do not wish to expose credentials in configuration.

Password

The Password of the username specified in DATAOPS_SOLE_USERNAME.

This value is set in the variable DATAOPS_SOLE_PASSWORD. If the value is not present in environment variables, SOLE tries to set the value from the DataOps vault path SNOWFLAKE.SOLE.PASSWORD.

The value of the variable can also be set using further DataOps Vault functionality if users do not wish to expose credentials in configuration.

Role

The Role of the Snowflake User as defined in DATAOPS_SOLE_USERNAME, which runs the SOLE queries. This value has to be defined even if the role is the default role for the Snowflake User.

This value is set in the variable DATAOPS_SOLE_ROLE. If the value is not present in the variables, SOLE tries to set the value from the DataOps vault path SNOWFLAKE.SOLE.ROLE.

The value of the variable can also be set using further DataOps Vault functionality if users do not wish to expose credentials in configuration.

Terraform Variables

As SOLE utilizes terraform, the credentials/variables are converted to terraform variables(TF_VAR_ appended at front of each variable). The variable DATAOPS_SOLE_ACCOUNT is duplicated with the name TF_VAR_DATAOPS_SOLE_ACCOUNT and similarly for other credentials. If any value already exists to such variables, it would be overridden

Warehouse

The Snowflake Warehouse that is used to run queries for SOLE Grant Management is defined in DATAOPS_SOLE_WAREHOUSE. This is an optional configuration. If not defined SOLE Grant Management would assume that the User specified in DATAOPS_SOLE_USERNAME has a default warehouse set.

This value is set in the variable DATAOPS_SOLE_WAREHOUSE. If the value is not present in environment variables, SOLE tries to set the value from the DataOps vault path SNOWFLAKE.SOLE.WAREHOUSE. If no such vault path exists, SOLE Grant Management would run queries with the assumption that the default warehouse is set.

The value of the variable can also be set using further DataOps Vault functionality if users do not wish to expose credentials in configuration.

SOLE Grant Management

For Privilege/Role Grant Management, SOLE provides a full alternative to Terraform. The SOLE Grant Management utilizes the same configuration as the Terraform Snowflake Provider to inspect, compare and execute operations.

The SOLE Grant Management is enabled by default. To disable this and revert to using the Terraform provider to manage grants, the variable DONT_USE_SOLE_GRANT_MANAGEMENT must be set.

SOLE Grant Management requires a warehouse to perform object inspection. The warehouse can either be specified in the variable DATAOPS_SOLE_WAREHOUSE or can be saved in the DataOps vault path SNOWFLAKE.SOLE.WAREHOUSE. If no Warehouse is defined, SOLE Grant Management would connect with the assumption that a default warehouse has been set for the user.

Default Warehouse

SOLE Grant Management requires that a Default Warehouse must be assigned to Snowflake User used to connect to Snowflake. To set a default warehouse for the User, the following query can be run in Snowflake:

ALTER USER <SOLE_USER> SET DEFAULT_WAREHOUSE = <WAREHOUSE_NAME>

SOLE Grant Management performs an inspection of Snowflake at the time of running Plan. It compares grants defined in the configuration with Snowflake and works out what Privilege/Role Grant is to be added or revoked. Due to the run-time inspection, no local state is required.

SOLE Grant Management also supports specifying Roles and Users to operate on. In the configuration, Roles and Users can be defined with keys specified_roles and specified_users respectively. When defined, SOLE Grant Management would only operate on Roles and Users that match the values defined. Either Object name or Regex can be provided in the keys specified_roles and specified_users.

Example
specified_roles:
- DATAOPS_ADMIN
- DATAOPS_*

specified_users:
- SOLEADMIN
- DATAOPS_*

With the above configuration, SOLE Grant Management would only grant or revoke privilege if the role is either DATAOPS_ADMIN or matches regex DATAOPS_*. Similarly, a role would only be granted or revoked if:

  • the Role is either DATAOPS_ADMIN or matches regex DATAOPS_*, or
  • the User is either SOLEADMIN or matches regex DATAOPS_*

Production Database and SOLE User Deletion Prevention

The SOLE PLAN or COMPILE jobs per default fail pipeline execution to prevent the deletion of a production database or altering a user needed to connect SOLE to Snowflake.

Production Database Deletion Prevention

A database can be accidentally deleted in the PROD environment if the corresponding configuration is removed or commented out. This could affect the pipeline or other environments. To prevent this accidental deletion, SOLE fails the PLAN job if any database deletion is detected.

This prevention is only implemented in the PROD environment (master branch).

note

If your project has another branch set up as PROD environment, this can be set by the environment variable DATAOPS_SOLE_PROD_BRANCH

Deletion Prevention can be disabled by setting the variable DATAOPS_SOLE_OVERRIDE to any value.

SOLE User Deletion Prevention

SOLE doesn't allow the management of the User required to connect to Snowflake. If a configuration for this user is found, the COMPILE job fails. This is done to prevent accidental updates to the user's password or deletion of the user.

Deletion Prevention can be disabled by setting the variable DATAOPS_SOLE_OVERRIDE to any value.

danger

Setting the variable DATAOPS_SOLE_OVERRIDE may lead to the deletion of critical Snowflake Objects such as PROD databases and the SOLE User. A value must only be set if necessary.

Lifecycle Actions

COMPILE

When LIFECYCLE_ACTION is set to COMPILE, the engine generates the configuration required to initialize and manage the Snowflake objects from the user-defined specifications.

Using the specifications defined in files, the engine performs validation and dependency resolution and generates resource configurations and syntax. During compile, the engine also resolves the grants and revokes on Snowflake Objects. For more information see Access Management section.

The following parameters can be used to control the execution of COMPILE:

ParameterRequired or Optional (and default)Description
CONFIGURATION_DIRREQUIREDPath of the directory in which the configuration files are present
DATAOPS_PREFIXREQUIREDPrefix to be added to Account-Level and Databases (Except default database)
DATAOPS_ENV_NAMEREQUIREDSuffix to be added to Account-Level and Databases (Except default database)
DATAOPS_DEBUGOPTIONALIf set, enables debug logs for the Engine

VALIDATE

When LIFECYCLE_ACTION is set to VALIDATE, the engine performs validation of the resource configurations generated during the COMPILE stage.

PLAN

When LIFECYCLE_ACTION is set to PLAN, the engine performs planning of execution steps to create or update the managed Snowflake Objects.

Before generating a plan, the engine tries to import all the specified Snowflake Objects if they exist. This import helps initialize or update the local state file for existing Snowflake Objects which is used to generate the plan. An import summary is displayed during and after the completion of the import.

Due to aliases and multiple format support, a plan might have in-place changes for Snowflake Objects. Refer to In-Place Changes for more information.

In case the state file is corrupted, if there is a discrepancy in the state file (an Object is managed in the state, but deleted from Snowflake externally), the file can be removed and re-initialized from the current Snowflake Object configuration. To do so, set in the variable LIFECYCLE_STATE_RESET to any value. This can be set for the pipeline, to re-initialize the state for all resource groups or individually for any PLAN job to just reset the state for that particular resource.

The generated plan is provided to the APPLY action.

PLAN-DESTROY

Similar to PLAN, when LIFECYCLE_ACTION is set to PLAN-DESTROY, the engine performs planning of execution steps to destroy the managed Snowflake Objects.

Using the existing state (if any), an execution plan to destroy the managed Snowflake Objects is generated and saved. The execution plan lists the changes it has detected to perform to delete/drop the managed Snowflake objects.

The generated plan is provided to the DESTROY action.

APPLY

When LIFECYCLE_ACTION is set to APPLY, the engine executes the actions specified by the PLAN action.

Using the output from the plan, the engine executes the defined steps. This ensures that the engine only performs those changes that were logged by the PLAN action. This implies that the user is aware of all the changes to be made.

DESTROY

When LIFECYCLE_ACTION is set to DESTROY, the engine executes the actions specified by the PLAN-DESTROY action.

Using the output from PLAN-DESTROY, the engine executes the defined deletion steps. This ensures that the engine only performs those changes that were logged by the PLAN-DESTROY action. This implies that the user is aware of all the changes to be made.

AGGREGATE

As an alternative to setting up individual jobs for each resource group and action, an aggregate action can be configured to perform COMPILE on configuration and VALIDATE, PLAN and APPLY for all resource groups in sequential manner in a single job.

This cuts down on the number of jobs to be set up in a pipeline to single jobs to manage the lifecycle of Snowflake Objects.

This can be executed by setting LIFECYCLE_ACTION to AGGREGATE.

As AGGREGATE acts on all resource groups, LIFECYCLE_MANAGE_OBJECT does not have to be specified, however other variables such as CONFIGURATION_DIR are required.

AGGREGATE-DESTROY

Similar to AGGREGATE, the AGGREGATE-DESTROY performs the PLAN-DESTROY and DESTROY actions in a single job.

This can be executed by setting LIFECYCLE_ACTION to AGGREGATE-DESTROY.

Supported Snowflake Objects

Following is the list of Snowflake Objects whose Lifecycle can be managed with the Snowflake Object Lifecycle Engine. To get more information regarding configuration, supported parameters and examples, please visit each object.