SOLE for Data Products Migration

SOLE for Data Products is a new framework for SOLE (Snowflake Object Lifecycle Engine) that will help you quickly build an ecosystem of data products. It requires a different structure for its declarative configuration than the current SOLE. While the current Snowflake object configuration stores all objects of the same type in a single file, object definitions in SOLE for Data Products are split into separate files, allowing their organization into type-specific or domain-specific files. Learn about SOLE for Data Products.

To use SOLE for Data Products with your existing DataOps projects, you must migrate/update your current Snowflake object configuration to match the new structure.

This page — along with SOLE Reference Guide — uses the DataOps Template Project as a reference to explain with examples how to convert the DataOps standard template to use the structure that works with SOLE for Data Products.

Migration Tools

We know that converting the configuration for many objects manually is not feasible. To help with this, we are working on a migration tool that converts the current SOLE configuration to the new structure.

We aim to make this tool available before the public preview of the functionality and appreciate your patience.

DataOps Template Project

We use the standard DataOps Template Project as a reference for the migration process. Find more about its implementation at DataOps Template on our data product platform.

We will update existing objects and also add the below new objects to show you more examples of the migration procedure:

Resource Monitor INGESTION_MONITOR
Schema INGESTION_SCHEMA
- Table INGESTION_TABLE
- File format CSV_FORMAT
- Stage CSV_STAGE
Schema FIVETRAN_SCHEMA
- Stage CSV_STAGE
Schema TRANSFORMATION_SCHEMA
- File format CSV_FORMAT
- Stage CSV_STAGE

Converting current configuration to SOLE for Data Products

To convert the DataOps Template Project to use SOLE for Data Products, you must:

Add a new project configuration file.
Update the configuration of all existing objects one by one.

Adding a project configuration file

Add the file dataops_config.yml in the dataops/snowflake directory. This file is necessary to enable SOLE for Data Products. You can add an empty file with no content to enable the functionality.

Converting roles

In the DataOps Template Project, there are three roles: READER, WRITER, and ADMIN. Following are the configurations for these three objects in the current structure and data products:

Default Configuration
Data Products Configuration

dataops/snowflake/roles.template.yml
## Main DataOps role hierarchy. READER -> WRITER -> ADMIN -> DATAOPS_SOLE_ADMIN_ROLE
## These roles are shared across all environments and are managed from PROD
roles:
  ## Read-only role, reserved for future use
  READER:
    namespacing: prefix
    environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
    roles:
      - WRITER

  ## Main transformation role (used in MATE). Can create schemas, tables, and views
  WRITER:
    namespacing: prefix
    environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
    roles:
      - ADMIN

  ## Administration role, used for reporting
  ADMIN:
    namespacing: prefix
    environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
    roles:
      - "{{ SNOWFLAKE.SOLE.ROLE }}"

dataops/snowflake/roles.template.yml
## Main DataOps role hierarchy. READER -> WRITER -> ADMIN -> DATAOPS_SOLE_ADMIN_ROLE
## These roles are shared across all environments and are managed from PROD

## Read-only role, reserved for future use
- role:
    name: READER
    namespacing: prefix
    environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
    roles:
      - rel(role.WRITER)

## Main transformation role (used in MATE). Can create schemas, tables, and views
- role:
    name: WRITER
    namespacing: prefix
    environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
    roles:
      - rel(role.ADMIN)

## Administration role, used for reporting
- role:
    name: ADMIN
    namespacing: prefix
    environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
    roles:
      - "{{ SNOWFLAKE.SOLE.ROLE }}"

Converting databases

In the DataOps Template Project, there are two databases, DATAOPS_DATABASE and SNOWFLAKE. Following are the configurations for these two objects in the current structure and data products:

Default Configuration
Data Products Configuration

dataops/snowflake/databases.template.yml
databases:

  ## Main DataOps database
  ## Its name is generated by scripts in the DataOps reference project and set here using an environment variable
  "{{ env.DATAOPS_DATABASE }}":
    comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}

    {% if (env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_PROD and env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_QA) %}
    ## Pipelines running in branches that are not main or qa will clone this database from the main production database
    from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
    data_retention_time_in_days: 1
    {% endif %}

    ## Grant access to this database for all roles, with WRITER also having the ability to create schemas
    grants:
      USAGE:
        - READER
      CREATE SCHEMA:
        - WRITER

  ## Grant access to Snowflake's built-in metadata database
  ## This is required for some reporting applications
  SNOWFLAKE:
    namespacing: none
    manage_mode: grants
    grants:
      IMPORTED PRIVILEGES:
        - ADMIN

dataops/snowflake/roles.template.yml
## Main DataOps database
## Its name is generated by scripts in the DataOps reference project and set here using an environment variable
- database:
    name: "{{ env.DATAOPS_DATABASE }}"
    comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}

    {% if (env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_PROD and env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_QA) %}
    ## Pipelines running in branches that are not main or qa will clone this database from the main production database
    from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
    data_retention_time_in_days: 1
    {% endif %}

    ## Grant access to this database for all roles, with WRITER, also having the ability to create schemas
    grants:
      USAGE:
        - rel(role.READER)
      CREATE SCHEMA:
        - rel(role.WRITER)

## Grant access to Snowflake's built-in metadata database
## This is required for some reporting applications
- database:
    name: SNOWFLAKE
    manage_mode: grants
    namespacing: none
    grants:
      IMPORTED PRIVILEGES:
        - rel(role.ADMIN)

Converting schemas

As previously mentioned, we will add configuration for new schemas and schema-level objects to use more examples for migration.

This section adds the three new schemas INGESTION_SCHEMA, FIVETRAN_SCHEMA, and TRANSFORMATION_SCHEMA in the current and data product configurations. Schema-level objects are added for all three schemas in the next section.

In SOLE for Data Products, you can add the configurations for the new schemas in either the existing databases.template.yml or a new file.

Default Configuration
Data Products Configuration

dataops/snowflake/databases.template.yml
databases:

  ## Main DataOps database
  ## Its name is generated by scripts in the DataOps reference project and set here using an environment variable
  "{{ env.DATAOPS_DATABASE }}":
    comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}

    {% if (env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_PROD and env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_QA) %}
    ## Pipelines running in branches that are not main or qa will clone this database from the main production database
    from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
    data_retention_time_in_days: 1
    {% endif %}

    ## Grant access to this database for all roles, with WRITER, also having the ability to create schemas
    grants:
      USAGE:
        - READER
      CREATE SCHEMA:
        - WRITER
    schemas:
      INGESTION_SCHEMA:
        comment: Schema for Ingestion
      FIVETRAN_SCHEMA:
        comment: Schema for Fivetran Ingestion
      TRANSFORMATION_SCHEMA:
        comment: Schema for Transformation

dataops/snowflake/schemas.template.yml
- schema:
    name: INGESTION_SCHEMA
    comment: Schema for Ingestion
    database: rel(database.{{ env.DATAOPS_DATABASE }})

- schema:
    name: FIVETRAN_SCHEMA
    comment: Schema for Fivetran Ingestion
    database: rel(database.{{ env.DATAOPS_DATABASE }})

- schema:
    name: TRANSFORMATION_SCHEMA
    comment: Schema for Transformation
    database: rel(database.{{ env.DATAOPS_DATABASE }})

Converting schema-level objects

To provide examples of what the schema-level object configuration looks like, let's add new objects in all the schemas.

In INGESTION_SCHEMA, add the three objects:

Table INGESTION_TABLE
File format CSV_FORMAT
Stage CSV_STAGE
- This stage uses the file format CSV_FORMA present in the same schema.

In TRANSFORMATION_SCHEMA, add the two objects:

File format CSV_FORMAT
Stage CSV_STAGE
- This stage uses the file format CSV_FORMAT present in the same schema.

In FIVETRAN_SCHEMA add one object:

Stage CSV_STAG
- This stage uses the file format CSV_FORM present in the schema INGESTION_SCHEMA.

Default Configuration
Data Products Configuration

dataops/snowflake/roles.template.yml
databases:

  ## Main DataOps database
  ## Its name is generated by scripts in the DataOps reference project and set here using an environment variable
  "{{ env.DATAOPS_DATABASE }}":
    comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}

    {% if (env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_PROD and env.DATAOPS_ENV_NAME != env.DATAOPS_ENV_NAME_QA) %}
    ## Pipelines running in branches that are not main or qa will clone this database from the main production database
    from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
    data_retention_time_in_days: 1
    {% endif %}

    ## Grant access to this database for all roles, with WRITER, also having the ability to create schemas
    grants:
      USAGE:
        - READER
      CREATE SCHEMA:
        - WRITER
    schemas:
      INGESTION_SCHEMA:
        comment: Schema for Ingestion
        tables:
          INGESTION_TABLE:
            columns:
              COL_1:
                type: VARCHAR
              COL_2:
                type: VARCHAR
        file_formats:
          CSV_FORMAT:
            comment: "File format for type csv"
            format_type: CSV
            compression: AUTO
            binary_format: HEX
            record_delimiter: "\n"
            grants:
              USAGE:
                - WRITER
                - READER
                - ADMIN
        stages:
          CSV_STAGE:
            comment: Load CSV Data
            url: s3://source-bucket/csv/
            CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
            file_format: CSV_FORMAT
            grants:
              ALL PRIVILEGES:
                - WRITER
                - READER
                - ADMIN

      FIVETRAN_SCHEMA:
        comment: Schema for Fivetran Ingestion
        stages:
          CSV_STAGE:
            comment: Load CSV Data
            url: s3://source-bucket/csv/
            CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
            file_format:
              format_name: CSV_FORMAT
              format_schema: INGESTION_SCHEMA
            grants:
              ALL PRIVILEGES:
                - WRITER
                - READER
                - ADMIN

      TRANSFORMATION_SCHEMA:
        comment: Schema for Transformation
        file_formats:
          CSV_FORMAT:
            comment: "File format for type csv"
            format_type: CSV
            compression: AUTO
            binary_format: HEX
            record_delimiter: "\n"
            grants:
              USAGE:
                - WRITER
                - READER
                - ADMIN
        stages:
          CSV_STAGE:
            comment: Load CSV Data
            url: s3://source-bucket/csv/
            CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
            file_format: CSV_FORMAT
            grants:
              ALL PRIVILEGES:
                - WRITER
                - READER
                - ADMIN

dataops/snowflake/ingestion_objects.template.yml
## Table in schema INGESTION_SCHEMA
- table:
    name: INGESTION_TABLE
    schema: rel(schema.INGESTION_SCHEMA)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    columns:
      COL_1:
        type: VARCHAR
      COL_2:
        type: VARCHAR

## File format in schema INGESTION_SCHEMA
- file_format:
    name: CSV_FORMAT
    ## Assigning a unique ID so it can be used from any schema in any database
    dataops.id: INGESTION_CSV_FORMAT
    schema: rel(schema.INGESTION_SCHEMA)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    format_type: CSV
    compression: AUTO
    binary_format: HEX
    record_delimiter: "\n"
    grants:
      USAGE:
        - rel(role.WRITER)
        - rel(role.READER)
        - rel(role.ADMIN)

## Stage in schema INGESTION_SCHEMA
- stage:
    comment: Load CSV Data
    name: CSV_STAGE
    schema: rel(schema.INGESTION_SCHEMA)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    url: s3://source-bucket/csv/
    CREDENTIALS: AWS_KEY_ID = 'AWS.INGESTION.ACCESS_KEY_ID' AWS_SECRET_KEY = 'AWS.INGESTION.SECRET_ACCESS_KEY'
    file_format: rel(file_format.INGESTION_CSV_FORMAT)
    grants:
      ALL PRIVILEGES:
        - rel(role.WRITER)
        - rel(role.READER)
        - rel(role.ADMIN)

## Stage in schema FIVETRAN_SCHEMA
- stage:
    comment: Load CSV Data
    name: CSV_STAGE
    schema: rel(schema.FIVETRAN_SCHEMA)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    ## With the Unique ID, using File Format present in schema INGESTION_SCHEMA
    file_format: rel(file_format.INGESTION_CSV_FORMAT)
    url: s3://source-bucket/csv/
    CREDENTIALS: AWS_KEY_ID = 'AWS.INGESTION.ACCESS_KEY_ID' AWS_SECRET_KEY = 'AWS.INGESTION.SECRET_ACCESS_KEY'
    grants:
      ALL PRIVILEGES:
        - rel(role.WRITER)
        - rel(role.READER)
        - rel(role.ADMIN)

dataops/snowflake/transformation_objects.template.yml
## File Format in schema TRANSFORMATION_SCHEMA
- file_format:
    name: CSV_FORMAT
    schema: rel(schema.TRANSFORMATION_SCHEMA)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    format_type: CSV
    compression: AUTO
    binary_format: HEX
    record_delimiter: "\n"
    grants:
      USAGE:
        - rel(role.WRITER)
        - rel(role.READER)
        - rel(role.ADMIN)

## Stage in schema TRANSFORMATION_SCHEMA
- stage:
    comment: Load CSV Data
    name: CSV_STAGE
    schema: rel(schema.TRANSFORMATION_SCHEMA)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    url: s3://staging-bucket/csv/
    CREDENTIALS: AWS_KEY_ID = 'AWS.INGESTION.ACCESS_KEY_ID' AWS_SECRET_KEY = 'AWS.INGESTION.SECRET_ACCESS_KEY'
    file_format: rel(file_format.CSV_FORMAT)
    grants:
      ALL PRIVILEGES:
        - rel(role.WRITER)
        - rel(role.READER)
        - rel(role.ADMIN)

Converting resource monitor

To provide examples of what the Resource Monitor configuration looks like, let's add the configurations for INGESTION_MONITOR in the current structure and data products. INGESTION_MONITOR will be used with the warehouse INGESTION.

Default Configuration
Data Products Configuration

dataops/snowflake/ingestion_monitor.yml
resource_monitors:
  INGESTION_MONITOR:
    credit_quota: 100
    notify_triggers:
      - 40
      - 50
    suspend_triggers:
      - 75
    suspend_immediate_triggers:
      - 90
    grants:
      ALL PRIVILEGES:
        - WRITER
        - READER
        - ADMIN

dataops/snowflake/ingestion_monitor.yml
- resource_monitor:
    name: INGESTION_MONITOR
    credit_quota: 100
    notify_triggers:
      - 40
      - 50
    suspend_triggers:
      - 75
    suspend_immediate_triggers:
      - 90
    grants:
      ALL PRIVILEGES:
        - rel(role.WRITER)
        - rel(role.READER)
        - rel(role.ADMIN)

Converting warehouses

In the DataOps Template Project, there are two warehouses, INGESTION and TRANSFORMATION. Following are the configurations for these two objects in the current structure and data products:

Default Configuration
Data Products Configuration

dataops/snowflake/warehouses.template.yml
## Warehouses for all common DataOps operations
## These roles are shared across all environments and are managed from PROD
## Note: The clustering/scaling options should be removed for standard-tier Snowflake accounts
warehouses:
  INGESTION:
    comment: Warehouse for Ingestion operations
    warehouse_size: MEDIUM
    ## Comment out max_cluster_count, min_cluster_count, and scaling_policy if you are using a standard snowflake account
    ## Read more on snowflake account types here: https://docs.snowflake.com/en/user-guide/intro-editions.html
    max_cluster_count: 2
    min_cluster_count: 1
    scaling_policy: ECONOMY
    auto_suspend: 40
    auto_resume: true
    namespacing: prefix
    environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
    resource_monitor: INGESTION_MONITOR

    grants:
      USAGE:
        - WRITER
        - ADMIN
      MONITOR:
        - ADMIN
      OPERATE:
        - ADMIN

  TRANSFORMATION:
    comment: Warehouse for Transformation operations
    warehouse_size: MEDIUM
    ## Comment out max_cluster_count, min_cluster_count, and scaling_policy if you are using a standard snowflake account
    ## Read more on snowflake account types here: https://docs.snowflake.com/en/user-guide/intro-editions.html
    max_cluster_count: 2
    min_cluster_count: 1
    scaling_policy: ECONOMY
    auto_suspend: 40
    auto_resume: true
    namespacing: prefix
    environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"

    grants:
      USAGE:
        - READER
      MONITOR:
        - ADMIN
      OPERATE:
        - ADMIN

dataops/snowflake/roles.template.yml
## Warehouses for all common DataOps operations
## These roles are shared across all environments and are managed from PROD
## Note: The clustering/scaling options should be removed for standard-tier Snowflake accounts

- warehouse:
    name: INGESTION
    comment: Warehouse for Ingestion operations
    warehouse_size: MEDIUM
    ## Comment out max_cluster_count, min_cluster_count, and scaling_policy if you are using a standard snowflake account
    ## Read more on snowflake account types here: https://docs.snowflake.com/en/user-guide/intro-editions.html
    max_cluster_count: 2
    min_cluster_count: 1
    scaling_policy: ECONOMY
    auto_suspend: 40
    auto_resume: true
    namespacing: prefix
    environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"
    resource_monitor: rel(resource_monitor.INGESTION_MONITOR)

    grants:
      USAGE:
        - rel(role.WRITER)
        - rel(role.ADMIN)
      MONITOR:
        - rel(role.ADMIN)
      OPERATE:
        - rel(role.ADMIN)

- warehouse:
    name: TRANSFORMATION
    comment: Warehouse for Transformation operations
    warehouse_size: MEDIUM
    ## Comment out max_cluster_count, min_cluster_count, and scaling_policy if you are using a standard snowflake account
    ## Read more on snowflake account types here: https://docs.snowflake.com/en/user-guide/intro-editions.html
    max_cluster_count: 2
    min_cluster_count: 1
    scaling_policy: ECONOMY
    auto_suspend: 40
    auto_resume: true
    namespacing: prefix
    environment: "{{ env.DATAOPS_ENV_NAME_PROD }}"

    grants:
      USAGE:
        - rel(role.READER)
      MONITOR:
        - rel(role.ADMIN)
      OPERATE:
        - rel(role.ADMIN)

DataOps Template Project​

Converting current configuration to SOLE for Data Products​

Adding a project configuration file​

Converting roles​

Converting databases​

Converting schemas​

Converting schema-level objects​

Converting resource monitor​

Converting warehouses​