SOLE for Data Products Migration
SOLE for Data Products is a new framework for SOLE (Snowflake Object Lifecycle Engine) that will help you quickly build an ecosystem of data products. It requires a different structure for its declarative configuration than the current SOLE. While the current Snowflake object configuration stores all objects of the same type in a single file, object definitions in SOLE for Data Products are split into separate files, allowing their organization into type-specific or domain-specific files. Learn about SOLE for Data Products.
To use SOLE for Data Products with your existing DataOps projects, you must migrate/update your current Snowflake object configuration to match the new structure.
This page — along with SOLE Reference Guide — uses the DataOps Template Project as a reference to explain with examples how to convert the DataOps standard template to use the structure that works with SOLE for Data Products.
We know that converting the configuration for many objects manually is not feasible. To help with this, we are working on a migration tool that converts the current SOLE configuration to the new structure.
We aim to make this tool available before the public preview of the functionality and appreciate your patience.
DataOps Template Project
We use the standard DataOps Template Project as a reference for the migration process. Find more about its implementation at DataOps Template on our data product platform.
We will update existing objects and also add the below new objects to show you more examples of the migration procedure:
- Resource Monitor
INGESTION_MONITOR
- Schema
INGESTION_SCHEMA
- Table
INGESTION_TABLE
- File format
CSV_FORMAT
- Stage
CSV_STAGE
- Table
- Schema
FIVETRAN_SCHEMA
- Stage
CSV_STAGE
- Stage
- Schema
TRANSFORMATION_SCHEMA
- File format
CSV_FORMAT
- Stage
CSV_STAGE
- File format
Converting current configuration to SOLE for Data Products
To convert the DataOps Template Project to use SOLE for Data Products, you must:
- Add a new project configuration file.
- Update the configuration of all existing objects one by one.
Adding a project configuration file
Add the file dataops_config.yml
in the dataops/snowflake directory. This file is necessary to enable SOLE for Data Products. You can add an empty file with no content to enable the functionality.
Converting roles
In the DataOps Template Project, there are three roles: READER, WRITER, and ADMIN. Following are the configurations for these three objects in the current structure and data products:
- Default Configuration
- Data Products Configuration
## Main DataOps role hierarchy. READER -> WRITER -> ADMIN -> DATAOPS_SOLE_ADMIN_ROLE
## These roles are shared across all environments and are managed from PROD
roles:
## Read-only role, reserved for future use
READER:
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
roles:
- WRITER
## Main transformation role (used in MATE). Can create schemas, tables, and views
WRITER:
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
roles:
- ADMIN
## Administration role, used for reporting
ADMIN:
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
roles:
- "{{ SNOWFLAKE.SOLE.ROLE }}"
## Main DataOps role hierarchy. READER -> WRITER -> ADMIN -> DATAOPS_SOLE_ADMIN_ROLE
## These roles are shared across all environments and are managed from PROD
## Read-only role, reserved for future use
- role:
name: READER
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
roles:
- rel(role.WRITER)
## Main transformation role (used in MATE). Can create schemas, tables, and views
- role:
name: WRITER
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
roles:
- rel(role.ADMIN)
## Administration role, used for reporting
- role:
name: ADMIN
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
roles:
- "{{ SNOWFLAKE.SOLE.ROLE }}"
Converting databases
In the DataOps Template Project, there are two databases, DATAOPS_DATABASE and SNOWFLAKE. Following are the configurations for these two objects in the current structure and data products:
- Default Configuration
- Data Products Configuration
databases:
## Main DataOps database
## Its name is generated by scripts in the DataOps reference project and set here using an environment variable
"{{ env.DATAOPS_DATABASE }}":
comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}
{% if (env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_PROD and env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_QA) %}
## Pipelines running in branches that are not main or qa will clone this database from the main production database
from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
data_retention_time_in_days: 1
{% endif %}
## Grant access to this database for all roles, with WRITER also having the ability to create schemas
grants:
USAGE:
- READER
CREATE SCHEMA:
- WRITER
## Grant access to Snowflake's built-in metadata database
## This is required for some reporting applications
SNOWFLAKE:
namespacing: none
manage_mode: grants
grants:
IMPORTED PRIVILEGES:
- ADMIN
## Main DataOps database
## Its name is generated by scripts in the DataOps reference project and set here using an environment variable
- database:
name: "{{ env.DATAOPS_DATABASE }}"
comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}
{% if (env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_PROD and env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_QA) %}
## Pipelines running in branches that are not main or qa will clone this database from the main production database
from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
data_retention_time_in_days: 1
{% endif %}
## Grant access to this database for all roles, with WRITER, also having the ability to create schemas
grants:
USAGE:
- rel(role.READER)
CREATE SCHEMA:
- rel(role.WRITER)
## Grant access to Snowflake's built-in metadata database
## This is required for some reporting applications
- database:
name: SNOWFLAKE
manage_mode: grants
namespacing: none
grants:
IMPORTED PRIVILEGES:
- rel(role.ADMIN)
Converting schemas
As previously mentioned, we will add configuration for new schemas and schema-level objects to use more examples for migration.
This section adds the three new schemas INGESTION_SCHEMA
, FIVETRAN_SCHEMA
, and TRANSFORMATION_SCHEMA
in the current and data product configurations. Schema-level objects are added for all three schemas in the next section.
In SOLE for Data Products, you can add the configurations for the new schemas in either the existing databases.template.yml
or a new file.
- Default Configuration
- Data Products Configuration
databases:
## Main DataOps database
## Its name is generated by scripts in the DataOps reference project and set here using an environment variable
"{{ env.DATAOPS_DATABASE }}":
comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}
{% if (env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_PROD and env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_QA) %}
## Pipelines running in branches that are not main or qa will clone this database from the main production database
from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
data_retention_time_in_days: 1
{% endif %}
## Grant access to this database for all roles, with WRITER, also having the ability to create schemas
grants:
USAGE:
- READER
CREATE SCHEMA:
- WRITER
schemas:
INGESTION_SCHEMA:
comment: Schema for Ingestion
FIVETRAN_SCHEMA:
comment: Schema for Fivetran Ingestion
TRANSFORMATION_SCHEMA:
comment: Schema for Transformation
- schema:
name: INGESTION_SCHEMA
comment: Schema for Ingestion
database: rel(database.{{ env.DATAOPS_DATABASE }})
- schema:
name: FIVETRAN_SCHEMA
comment: Schema for Fivetran Ingestion
database: rel(database.{{ env.DATAOPS_DATABASE }})
- schema:
name: TRANSFORMATION_SCHEMA
comment: Schema for Transformation
database: rel(database.{{ env.DATAOPS_DATABASE }})
Converting schema-level objects
To provide examples of what the schema-level object configuration looks like, let's add new objects in all the schemas.
In INGESTION_SCHEMA
, add the three objects:
- Table
INGESTION_TABLE
- File format
CSV_FORMAT
- Stage
CSV_STAGE
- This stage uses the file format
CSV_FORMA
present in the same schema.
- This stage uses the file format
In TRANSFORMATION_SCHEMA
, add the two objects:
- File format
CSV_FORMAT
- Stage
CSV_STAGE
- This stage uses the file format
CSV_FORMAT
present in the same schema.
- This stage uses the file format
In FIVETRAN_SCHEMA
add one object:
- Stage
CSV_STAG
- This stage uses the file format
CSV_FORM
present in the schemaINGESTION_SCHEMA
.
- This stage uses the file format
- Default Configuration
- Data Products Configuration
databases:
## Main DataOps database
## Its name is generated by scripts in the DataOps reference project and set here using an environment variable
"{{ env.DATAOPS_DATABASE }}":
comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}
{% if (env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_PROD and env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_QA) %}
## Pipelines running in branches that are not main or qa will clone this database from the main production database
from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
data_retention_time_in_days: 1
{% endif %}
## Grant access to this database for all roles, with WRITER, also having the ability to create schemas
grants:
USAGE:
- READER
CREATE SCHEMA:
- WRITER
schemas:
INGESTION_SCHEMA:
comment: Schema for Ingestion
tables:
INGESTION_TABLE:
columns:
COL_1:
type: VARCHAR
COL_2:
type: VARCHAR
file_formats:
CSV_FORMAT:
comment: "File format for type csv"
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"
grants:
USAGE:
- WRITER
- READER
- ADMIN
stages:
CSV_STAGE:
comment: Load CSV Data
url: s3://source-bucket/csv/
CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
file_format: CSV_FORMAT
grants:
ALL PRIVILEGES:
- WRITER
- READER
- ADMIN
FIVETRAN_SCHEMA:
comment: Schema for Fivetran Ingestion
stages:
CSV_STAGE:
comment: Load CSV Data
url: s3://source-bucket/csv/
CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
file_format:
format_name: CSV_FORMAT
format_schema: INGESTION_SCHEMA
grants:
ALL PRIVILEGES:
- WRITER
- READER
- ADMIN
TRANSFORMATION_SCHEMA:
comment: Schema for Transformation
file_formats:
CSV_FORMAT:
comment: "File format for type csv"
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"
grants:
USAGE:
- WRITER
- READER
- ADMIN
stages:
CSV_STAGE:
comment: Load CSV Data
url: s3://source-bucket/csv/
CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
file_format: CSV_FORMAT
grants:
ALL PRIVILEGES:
- WRITER
- READER
- ADMIN
## Table in schema INGESTION_SCHEMA
- table:
name: INGESTION_TABLE
schema: rel(schema.INGESTION_SCHEMA)
database: rel(database.{{ env.DATAOPS_DATABASE }})
columns:
COL_1:
type: VARCHAR
COL_2:
type: VARCHAR
## File format in schema INGESTION_SCHEMA
- file_format:
name: CSV_FORMAT
## Assigning a unique ID so it can be used from any schema in any database
dataops.id: INGESTION_CSV_FORMAT
schema: rel(schema.INGESTION_SCHEMA)
database: rel(database.{{ env.DATAOPS_DATABASE }})
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"
grants:
USAGE:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
## Stage in schema INGESTION_SCHEMA
- stage:
comment: Load CSV Data
name: CSV_STAGE
schema: rel(schema.INGESTION_SCHEMA)
database: rel(database.{{ env.DATAOPS_DATABASE }})
url: s3://source-bucket/csv/
CREDENTIALS: AWS_KEY_ID = 'AWS.INGESTION.ACCESS_KEY_ID' AWS_SECRET_KEY = 'AWS.INGESTION.SECRET_ACCESS_KEY'
file_format: rel(file_format.INGESTION_CSV_FORMAT)
grants:
ALL PRIVILEGES:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
## Stage in schema FIVETRAN_SCHEMA
- stage:
comment: Load CSV Data
name: CSV_STAGE
schema: rel(schema.FIVETRAN_SCHEMA)
database: rel(database.{{ env.DATAOPS_DATABASE }})
## With the Unique ID, using File Format present in schema INGESTION_SCHEMA
file_format: rel(file_format.INGESTION_CSV_FORMAT)
url: s3://source-bucket/csv/
CREDENTIALS: AWS_KEY_ID = 'AWS.INGESTION.ACCESS_KEY_ID' AWS_SECRET_KEY = 'AWS.INGESTION.SECRET_ACCESS_KEY'
grants:
ALL PRIVILEGES:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
## File Format in schema TRANSFORMATION_SCHEMA
- file_format:
name: CSV_FORMAT
schema: rel(schema.TRANSFORMATION_SCHEMA)
database: rel(database.{{ env.DATAOPS_DATABASE }})
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"
grants:
USAGE:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
## Stage in schema TRANSFORMATION_SCHEMA
- stage:
comment: Load CSV Data
name: CSV_STAGE
schema: rel(schema.TRANSFORMATION_SCHEMA)
database: rel(database.{{ env.DATAOPS_DATABASE }})
url: s3://staging-bucket/csv/
CREDENTIALS: AWS_KEY_ID = 'AWS.INGESTION.ACCESS_KEY_ID' AWS_SECRET_KEY = 'AWS.INGESTION.SECRET_ACCESS_KEY'
file_format: rel(file_format.CSV_FORMAT)
grants:
ALL PRIVILEGES:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
Converting resource monitor
To provide examples of what the Resource Monitor configuration looks like, let's add the configurations for INGESTION_MONITOR
in the current structure and data products.
INGESTION_MONITOR
will be used with the warehouse INGESTION.
- Default Configuration
- Data Products Configuration
resource_monitors:
INGESTION_MONITOR:
credit_quota: 100
notify_triggers:
- 40
- 50
suspend_triggers:
- 75
suspend_immediate_triggers:
- 90
grants:
ALL PRIVILEGES:
- WRITER
- READER
- ADMIN
- resource_monitor:
name: INGESTION_MONITOR
credit_quota: 100
notify_triggers:
- 40
- 50
suspend_triggers:
- 75
suspend_immediate_triggers:
- 90
grants:
ALL PRIVILEGES:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
Converting warehouses
In the DataOps Template Project, there are two warehouses, INGESTION and TRANSFORMATION. Following are the configurations for these two objects in the current structure and data products:
- Default Configuration
- Data Products Configuration
## Warehouses for all common DataOps operations
## These roles are shared across all environments and are managed from PROD
## Note: The clustering/scaling options should be removed for standard-tier Snowflake accounts
warehouses:
INGESTION:
comment: Warehouse for Ingestion operations
warehouse_size: MEDIUM
## Comment out max_cluster_count, min_cluster_count, and scaling_policy if you are using a standard snowflake account
## Read more on snowflake account types here: https://docs.snowflake.com/en/user-guide/intro-editions.html
max_cluster_count: 2
min_cluster_count: 1
scaling_policy: ECONOMY
auto_suspend: 40
auto_resume: true
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
resource_monitor: INGESTION_MONITOR
grants:
USAGE:
- WRITER
- ADMIN
MONITOR:
- ADMIN
OPERATE:
- ADMIN
TRANSFORMATION:
comment: Warehouse for Transformation operations
warehouse_size: MEDIUM
## Comment out max_cluster_count, min_cluster_count, and scaling_policy if you are using a standard snowflake account
## Read more on snowflake account types here: https://docs.snowflake.com/en/user-guide/intro-editions.html
max_cluster_count: 2
min_cluster_count: 1
scaling_policy: ECONOMY
auto_suspend: 40
auto_resume: true
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
grants:
USAGE:
- READER
MONITOR:
- ADMIN
OPERATE:
- ADMIN
## Warehouses for all common DataOps operations
## These roles are shared across all environments and are managed from PROD
## Note: The clustering/scaling options should be removed for standard-tier Snowflake accounts
- warehouse:
name: INGESTION
comment: Warehouse for Ingestion operations
warehouse_size: MEDIUM
## Comment out max_cluster_count, min_cluster_count, and scaling_policy if you are using a standard snowflake account
## Read more on snowflake account types here: https://docs.snowflake.com/en/user-guide/intro-editions.html
max_cluster_count: 2
min_cluster_count: 1
scaling_policy: ECONOMY
auto_suspend: 40
auto_resume: true
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
resource_monitor: rel(resource_monitor.INGESTION_MONITOR)
grants:
USAGE:
- rel(role.WRITER)
- rel(role.ADMIN)
MONITOR:
- rel(role.ADMIN)
OPERATE:
- rel(role.ADMIN)
- warehouse:
name: TRANSFORMATION
comment: Warehouse for Transformation operations
warehouse_size: MEDIUM
## Comment out max_cluster_count, min_cluster_count, and scaling_policy if you are using a standard snowflake account
## Read more on snowflake account types here: https://docs.snowflake.com/en/user-guide/intro-editions.html
max_cluster_count: 2
min_cluster_count: 1
scaling_policy: ECONOMY
auto_suspend: 40
auto_resume: true
namespacing: prefix
environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
grants:
USAGE:
- rel(role.READER)
MONITOR:
- rel(role.ADMIN)
OPERATE:
- rel(role.ADMIN)