Skip to main content

SOLE for Data Products Migration

Feature release status badge: PubPrev
PubPrev

SOLE for Data Products is a new framework for SOLE (Snowflake Object Lifecycle Engine) that will help you easily build an ecosystem of data products. It requires a different structure for its declarative configuration than that of the current SOLE. While the current Snowflake object configuration stores all objects of the same type in a single file, object definitions in SOLE for Data Products are split into separate files allowing their organization into type-specific or domain-specific files. Learn more about Sole for Data Products.

To use SOLE for Data Products with your existing DataOps projects, you must migrate/update your current Snowflake object configuration to match the new structure.

This page — along with SOLE Reference Guide — uses the DataOps Template Project as a reference to explain with examples how to convert the DataOps standard template to use the structure that works with SOLE for Data Products.

Migration Tools

We are aware that converting the configuration for a large number of objects manually is not feasible. To help with this, we are working on a migration tool that will convert the current SOLE configuration to the new structure with a good level of accuracy.

We aim to make this tool available before the public preview of the functionality and appreciate your patience.

DataOps Template Project

We use the standard DataOps Template Project as a reference for the migration process. Find more about its implementation at DataOps Template on our data product platform.

We will update existing objects and also add the below new objects to show you more examples of the migration procedure:

  • Resource Monitor INGESTION_MONITOR
  • Schema INGESTION_SCHEMA
    • Table INGESTION_TABLE
    • File format CSV_FORMAT
    • Stage CSV_STAGE
  • Schema FIVETRAN_SCHEMA
    • Stage CSV_STAGE
  • Schema TRANSFORMATION_SCHEMA
    • File format CSV_FORMAT
    • Stage CSV_STAGE

Converting current configuration to SOLE for Data Products

To convert the DataOps Template Project to use SOLE for Data Products, you must:

  1. Add a new project configuration file.
  2. Update the configuration of all existing objects one by one.

Adding a project configuration file

Add the file dataops_config.yml in the dataops/snowflake directory. This file is necessary to enable SOLE for Data Products. You can add an empty file with no content to enable the functionality.

Converting roles

In the DataOps Template Project, there are three roles, READER, WRITER, and ADMIN. Following are the configurations for these three objects in the current structure and data products:

dataops/snowflake/roles.template.yml
## Main DataOps role hierarchy. READER -> WRITER -> ADMIN -> DATAOPS_SOLE_ADMIN_ROLE
## These roles are shared across all environments and are managed from PROD
roles:
## Read-only role, reserved for future use
READER:
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
roles:
- WRITER

## Main transformation role (used in MATE). Can create schemas, tables, and views
WRITER:
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
roles:
- ADMIN

## Administration role, used for reporting
ADMIN:
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
roles:
- "{{ SNOWFLAKE.SOLE.ROLE }}"

Converting databases

In the DataOps Template Project, there are two databases, DATAOPS_DATABASE and SNOWFLAKE. Following are the configurations for these two objects in the current structure and data products:

dataops/snowflake/databases.template.yml
databases:

## Main DataOps database
## Its name is generated by scripts in the DataOps reference project and set here using an environment variable
"{{ env.DATAOPS_DATABASE }}":
comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}

{% if (env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_PROD and env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_QA) %}
## Pipelines running in branches that are not main or qa will clone this database from the main production database
from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
data_retention_time_in_days: 1
{% endif %}

## Grant access to this database for all roles, with WRITER also having the ability to create schemas
grants:
USAGE:
- READER
CREATE SCHEMA:
- WRITER

## Grant access to Snowflake's built-in metadata database
## This is required for some reporting applications
SNOWFLAKE:
namespacing: none
manage_mode: grants
grants:
IMPORTED PRIVILEGES:
- ADMIN

Converting schemas

As previously mentioned, we will add configuration for new schemas and schema-level objects to use more examples for migration.

This section adds the three new schemas INGESTION_SCHEMA, FIVETRAN_SCHEMA, and TRANSFORMATION_SCHEMA in the current and data product configurations. Schema-level objects are added for all three schemas in the next section.

In SOLE for Data Products, you can add the configurations for the new schemas in either the existing databases.template.yml or a new file.

dataops/snowflake/databases.template.yml
databases:

## Main DataOps database
## Its name is generated by scripts in the DataOps reference project and set here using an environment variable
"{{ env.DATAOPS_DATABASE }}":
comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}

{% if (env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_PROD and env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_QA) %}
## Pipelines running in branches that are not main or qa will clone this database from the main production database
from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
data_retention_time_in_days: 1
{% endif %}

## Grant access to this database for all roles, with WRITER, also having the ability to create schemas
grants:
USAGE:
- READER
CREATE SCHEMA:
- WRITER
schemas:
INGESTION_SCHEMA:
comment: Schema for Ingestion
FIVETRAN_SCHEMA:
comment: Schema for Fivetran Ingestion
TRANSFORMATION_SCHEMA:
comment: Schema for Transformation

Converting schema-level objects

To provide examples of what the schema-level object configuration looks like, let's add new objects in all the schemas.

In INGESTION_SCHEMA, add the three objects:

  • Table INGESTION_TABLE
  • File format CSV_FORMAT
  • Stage CSV_STAGE
    • This stage uses the file format CSV_FORMA presents in the same schema.

In TRANSFORMATION_SCHEMA, add the two objects:

  • File format CSV_FORMAT
  • Stage CSV_STAGE
    • This stage uses the file format CSV_FORMAT presents in the same schema.

In FIVETRAN_SCHEMA add one object:

  • Stage CSV_STAG
    • This stage uses the file format CSV_FORM presents in the schema INGESTION_SCHEMA.
dataops/snowflake/roles.template.yml
databases:

## Main DataOps database
## Its name is generated by scripts in the DataOps reference project and set here using an environment variable
"{{ env.DATAOPS_DATABASE }}":
comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}

{% if (env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_PROD and env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_QA) %}
## Pipelines running in branches that are not main or qa will clone this database from the main production database
from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
data_retention_time_in_days: 1
{% endif %}

## Grant access to this database for all roles, with WRITER, also having the ability to create schemas
grants:
USAGE:
- READER
CREATE SCHEMA:
- WRITER
schemas:
INGESTION_SCHEMA:
comment: Schema for Ingestion
tables:
INGESTION_TABLE:
columns:
COL_1:
type: VARCHAR
COL_2:
type: VARCHAR
file_formats:
CSV_FORMAT:
comment: "File format for type csv"
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"
grants:
USAGE:
- WRITER
- READER
- ADMIN
stages:
CSV_STAGE:
comment: Load CSV Data
url: s3://source-bucket/csv/
CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
file_format: CSV_FORMAT
grants:
ALL PRIVILEGES:
- WRITER
- READER
- ADMIN

FIVETRAN_SCHEMA:
comment: Schema for Fivetran Ingestion
stages:
CSV_STAGE:
comment: Load CSV Data
url: s3://source-bucket/csv/
CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
file_format:
format_name: CSV_FORMAT
format_schema: INGESTION_SCHEMA
grants:
ALL PRIVILEGES:
- WRITER
- READER
- ADMIN

TRANSFORMATION_SCHEMA:
comment: Schema for Transformation
file_formats:
CSV_FORMAT:
comment: "File format for type csv"
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"
grants:
USAGE:
- WRITER
- READER
- ADMIN
stages:
CSV_STAGE:
comment: Load CSV Data
url: s3://source-bucket/csv/
CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
file_format: CSV_FORMAT
grants:
ALL PRIVILEGES:
- WRITER
- READER
- ADMIN

Converting resource monitor

To provide examples of what the Resource Monitor configuration looks like, let's add the configurations for INGESTION_MONITOR in the current structure and data products. INGESTION_MONITOR will be used with the warehouse INGESTION.

dataops/snowflake/ingestion_monitor.yml
resource_monitors:
INGESTION_MONITOR:
credit_quota: 100
notify_triggers:
- 40
- 50
suspend_triggers:
- 75
suspend_immediate_triggers:
- 90
grants:
ALL PRIVILEGES:
- WRITER
- READER
- ADMIN

Converting warehouses

In the DataOps Template Project, there are two warehouses, INGESTION and TRANSFORMATION. Following are the configurations for these two objects in the current structure and data products:

dataops/snowflake/warehouses.template.yml
## Warehouses for all common DataOps operations
## These roles are shared across all environments and are managed from PROD
## Note: The clustering/scaling options should be removed for standard-tier Snowflake accounts
warehouses:
INGESTION:
comment: Warehouse for Ingestion operations
warehouse_size: MEDIUM
## Comment out max_cluster_count, min_cluster_count, and scaling_policy if you are using a standard snowflake account
## Read more on snowflake account types here: https://docs.snowflake.com/en/user-guide/intro-editions.html
max_cluster_count: 2
min_cluster_count: 1
scaling_policy: ECONOMY
auto_suspend: 40
auto_resume: true
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
resource_monitor: INGESTION_MONITOR

grants:
USAGE:
- WRITER
- ADMIN
MONITOR:
- ADMIN
OPERATE:
- ADMIN

TRANSFORMATION:
comment: Warehouse for Transformation operations
warehouse_size: MEDIUM
## Comment out max_cluster_count, min_cluster_count, and scaling_policy if you are using a standard snowflake account
## Read more on snowflake account types here: https://docs.snowflake.com/en/user-guide/intro-editions.html
max_cluster_count: 2
min_cluster_count: 1
scaling_policy: ECONOMY
auto_suspend: 40
auto_resume: true
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"

grants:
USAGE:
- READER
MONITOR:
- ADMIN
OPERATE:
- ADMIN