DataOps Snowflake Object Lifecycle Engine

Overview

We have introduced SOLE for Data Products as a new framework for the Snowflake Object Lifecycle Engine (SOLE) to help you easily build an ecosystem of data products and benefit from greater accuracy and accessibility of data. The major difference is how you define Snowflake objects in the SOLE configuration. To enable and use this functionality, follow the steps detailed in Enabling SOLE for Data Products.

To address potential MATE job issues arising from the new default dbt version 1.5, see the dbt 1.5 Migration documentation.

note

There are two approaches for building data products with DataOps.live:

DataOps.live Create: This approach automates data product building, management, and deployment.
Data product implementation: This involves building each component from scratch, where you use pipelines to create, test, deploy, and update standalone or composite data products in your DataOps projects.

Why SOLE for Data Products?

Sole for Data Products answers two challenges:

It simplifies the management of Snowflake object configuration in large infrastructures by treating data as a product.
It allows the grouping of data products by domains to facilitate consuming data for anybody in an organization who needs data for a particular function. For example, a finance team needs its own set of objects like databases, roles, schemas, etc., while a marketing person may need product-related objects.

In this new version of SOLE, it is possible to relate object definitions to domains, where a domain can be a team. This lets you organize files in any way that better fits your organization.

The below table summarizes the differences between the current SOLE and SOLE for Data Products.

Features and Characteristics	Current SOLE	SOLE for Data Products	Benefits/Comment
Object definitions	Objects are grouped based on the object types and all configurations should be in the same file. If the object is schema-level, it must be in the same file as the database.	Object configurations are self-describing and they don't have to be in the same file under the parent's configuration	Break away from the nested structure and organize files into type-specific or domain-specific categories. Learn more about flat vs tree structure definitions.
File structure	Organize all object-specific configurations in a single file following the hierarchical data tree structure	Organize object configurations in separate files and in any structure that meets your organization's needs	Support file structures that match how data products are set up in a domain. Learn more about file structure.
Components (reuse object configuration)	Not applicable	Define code blocks of YAML configuration as part of any object definition and reuse them in any file	Reduce the duplicates required when configuring objects such as grants on a table or common Jinja templates. Learn more about components.
Object relations	Not applicable	Define the relationship between managed objects by using the `rel()` function	Make the relation between managed entities more explicit. This reduces the chances of SOLE making assumptions about the implied configuration and focuses more on the specified configuration. Learn more about the `rel` function.
Directory defaults	Not applicable	Use the project configuration file to define some default values that are applied to all objects within a directory and recursively within all sub-directories	Apply specific parameters for all objects present in a directory to avoid defining database name, schema name, and similar properties in all schema-level objects for example and having lengthy configurations. Learn more about directory defaults.
Name collision and `dataops.id`	Not applicable	Assign a unique ID for an object and use it to resolve the relationship in case of conflicts	Resolve the relationship defined using the `rel()` function in case multiple matches are found for an object. Learn more about unique SOLE IDs.

Enabling SOLE for Data Products

To enable SOLE for Data Products, add the file dataops_config.yml to the snowflake configuration directory.
This switches SOLE from using the current configuration to SOLE for Data Products.

Mutually Exclusive Formats

The current configuration structure and SOLE for Data Products are mutually exclusive. If SOLE for Data Products is enabled and configuration for the current structure is found, SOLE will throw an error.

The new configuration file dataops_config.yml helps you to:

Define defaults for directories.
Specify folders that contain non-object configuration.
Specify the config version.

Directory defaults

You can use the project configuration file to define defaults for a directory. This lets you apply some parameters for all objects present in a directory.

This is supported by the following elements:

Object parent
- database
- schema
SOLE keywords
- namespacing
- manage_mode
- environment

The defaults can be overridden following the increasing order of precedence below. For example, the value in the object definition overrides that in the component:

Default specified for subdirectory
Value specified in the included component
Value specified in the object definition

Non-object configuration

SOLE for Data Products supports specifying configurations other than object definition such as hooks, runtime-configuration, and the new functionality Components.

You must store these special configurations in special files or directories so that they are ignored while performing validation. Use the following keys to define the directory and files:

Hooks: hook-paths
Configuration: configuration-paths
Component: component-paths

Config version

Project configuration in SOLE for Data Products is planned to be further improved over time, with each major improvement added incrementally. To reinforce this evolution, the project configuration file adds support for a new parameter, config-version.

Currently, this parameter only supports value 2, but over time, new features would be added in the form of 2.1, 2.2, etc., which would allow a seamless transition to a new configuration with new functionalities.

Following is an example of dataops_config.yml with all the above-mentioned parameters:

config-version: 2

configuration-paths: [configuration]
component-paths: [components]
hook-paths: [hooks.yml]

ingestion_objects:
  +database: rel(database.{{ env.DATAOPS_DATABASE }})
  +schema: rel(schema.INGESTION_SCHEMA)
  fivetran:
    +schema: rel(schema.FIVETRAN_SCHEMA)

transformation_objects:
  +database: rel(database.{{ env.DATAOPS_DATABASE }})
  +schema: rel(schema.TRANSFORMATION_SCHEMA)

namespacing_prefix:
  +namespacing: prefix

Flat object definition

The current version of SOLE requires Snowflake object configuration to be grouped based on the object type. This means if you have multiple roles, you must define all of them in a single YAML file under the roles key. This type of object definition becomes more prevalent when you define schemas or schema-level objects, as these have to be in a tree structure, and this adds a challenge for organizations that manage a large number of objects in SOLE.

To address such a challenge, we have made a few major changes to how you define objects. Rather than having a grouped collection of objects, SOLE for Data Products goes for modular, self-describing, and explicit object definition. All definitions specify the object type and relation to other objects, such as their parents.

Self-describing objects

Object configuration in SOLE for Data Products is self-describing that anyone looking at just the object definition can immediately tell the following:

Object type
Object name
Object dependencies
Object relations

Following are a few examples showing this behavior:

- warehouse: # Type of the object
    dataops.extends:
      - component.default_warehouse # Adding a 'component' with the name 'default_warehouse'
    comment: Warehouse for Ingestion operations
    name: DATAOPS_INGESTION # Name of the object
    resource_monitor: rel(resource_monitor.INGESTION) # Adding a relation to managed Resource Monitor INGESTION
    grants:
      USAGE:
        - rel(role.WRITER) # Adding a relation to managed Role WRITER
      MONITOR:
        - SYSADMIN # Non managed Role SYSADMIN
        - rel(role.WRITER)

Break away from the nested structure

With SOLE's new structure for defining objects, child objects don't have to be in the same file under the parent's configuration. Each child object defines the details of the parent object as part of the configuration. This allows you to specify the configuration for hierarchical objects in entirely separate files.

The following configurations show you how to define roles and a database at the account level and then schema, table, and stage at the database level in separate files to match organization needs. In this example, objects are organized in object-specific directories, but you can use whatever structure suits you most.

dataops/snowflake/
dataops/snowflake/databases/
dataops/snowflake/schemas/
dataops/snowflake/tables/
dataops/snowflake/stages/

dataops/snowflake/roles.yml
- role:
    name: READER
    environment: PROD
    namespacing: prefix
    roles:
      - rel(role.WRITER)

- role:
    name: WRITER
    environment: PROD
    namespacing: prefix
    roles:
      - rel(role.ADMIN)

- role:
    name: ADMIN
    environment: PROD
    namespacing: prefix

dataops/snowflake/databases/dataops_database.template.yml
- database:
    name: "{{ env.DATAOPS_DATABASE }}"
    {% if (env.DATAOPS_ENV_NAME != 'PROD' and env.DATAOPS_ENV_NAME != 'QA') %}
    from_database: "{{ env.DATAOPS_DATABASE_MASTER }}" # If a database name and from_database values are different, then the new database would be cloned. Otherwise it would be created with a specified definition
    data_retention_time_in_days: 1
    {% endif %}
    comment: This Database is used for demonstration of the usage of the Snowflake Object LifeCycle Runner
    grants:
      USAGE:
        - rel(role.WRITER)
        - rel(role.READER)
        - rel(role.ADMIN)
      modify:
        - rel(role.WRITER)
        - rel(role.ADMIN)
      monitor:
        - rel(role.ADMIN)
        - SYSADMIN
      CREATE SCHEMA:
        - rel(role.WRITER)
        - rel(role.READER)
        - rel(role.ADMIN)

dataops/snowflake/schemas/ingestion_schemas.template.yml
- schema:
    dataops.extends:
      - component.default_schema_grants
    name: ADVENTUREWORKS_HUMANRESOURCES
    database: rel(database.{{ env.DATAOPS_DATABASE }})

dataops/snowflake/tables/ingestion_tables.template.yml
- table:
    name: JOBCANDIDATE
    schema: rel(schema.ADVENTUREWORKS_HUMANRESOURCES)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    comment: "JOBCANDIDATE Table"
    dataops.extends: component.ingestion_table_grants
    columns:
      JOBCANDIDATEID:
        type: NUMBER(38,0)
        comment: "Primary Key of the job candidate table"
      BUSINESSENTITYID:
        type: NUMBER(38,0)
        comment: "The Id of the business Entity"
      RESUME:
        type: VARCHAR(16777216)
        comment: "Resume of the candidate"
      MODIFIEDDATE:
        type: TIMESTAMP_NTZ(9)
        comment: "Timestamp when the record was modified"

dataops/snowflake/stages/ingestion.template.yml
- stage:
    name: S3_STAGE
    schema: rel(schema.ADVENTUREWORKS_HUMANRESOURCES)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    comment: Created to load the Adventureworks data
    url: s3://dataops-demo/adventureworks/
    CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
    file_format: rel(file_format.ADVW_UTF8_TAB_PUB)
    grants:
      ALL PRIVILEGES:
        - rel(role.WRITER)
        - rel(role.READER)
        - rel(role.ADMIN)
      USAGE:
        - rel(role.WRITER)
        - rel(role.READER)
        - rel(role.ADMIN)

File structure

SOLE for Data Products gives you the freedom to organize object definitions however you like, without any restriction, and in whatever structure that meets your organization's needs.

As the name suggests, SOLE for Data Products supports file structures that match how data products are set up in a domain. You can set up top-level directories for each data product in a domain — it is recommended to maintain a project per Domain. Objects belonging to the data product can then have their definition present in that directory. You can also organize data product directories with more subdirectories to represent an action, business logic, or object type.

Components

As many objects have a similar configuration, whether it is the grants on a table or file format properties, it makes sense to move such a configuration to a code block and reuse it. To answer this need, SOLE for Data Products introduces components that are pieces of YAML configuration you can use as part of any object definition.

Components are based on YAML anchors but don't have their limitations where the anchor and its usage must be in the same file. Components in SOLE for Data Products let you define object configuration in any file.

Modular configuration

Components are designed to serve as modular configurations you can use in any object definition — assuming that the object supports the specified parameters. A Component can be as small as a single parameter or all parameters of an object. You can then use components directly in an object definition or combine them with other components.

Following are a few examples of components:

dataops/snowflake/components/namespacing.yml
namespacing_prefix:
  namespacing: prefix

namespacing_none:
  namespacing: none

dataops/snowflake/components/table_grants.yml
ingestion_table_grants:
  grants:
    SELECT:
      - rel(role.READER)
      - rel(role.WRITER)
      - rel(role.ADMIN)
    INSERT:
      - rel(role.WRITER)
      - rel(role.ADMIN)
    UPDATE:
      - rel(role.WRITER)
      - rel(role.ADMIN)
    DELETE:
      - rel(role.ADMIN)

dataops/snowflake/components/warehouse.yml
default_warehouse:
  grants:
    USAGE:
      - rel(role.WRITER)
      - rel(role.READER)
      - rel(role.ADMIN)
    MONITOR:
      - SYSADMIN
      - rel(role.ADMIN)
  warehouse_size: X-SMALL
  auto_suspend: 40
  auto_resume: true
  namespacing: none

Object Definition vs Component Definition

Object definitions are defined as list items, whereas component definitions are configured with a key. This provides a distinct difference between the two to make the configuration self-describing.

Using components

As components are not Snowflake object configurations, they have to be organized in a directory. You must define this directory in the project configuration file dataops_config.yml with the key component-paths.

To use components:

Add a new parameter dataops.extends as part of the object or component definition. This parameter accepts either a string for a single component or a list for multiple components.

Use the format component.<component_name> to include the components.

- warehouse:
    dataops.extends:
      - component.warehouse_properties
      - component.warehouse_grants
    name: TRANSFORMATION
    comment: Warehouse for Transformation operations

- file_format:
    name: CSV_FORMAT
    schema: rel(schema.INGESTION_SCHEMA)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    dataops.extends: component.csv_file_format

Reusing components

In the previous section, you saw how components are included in an object definition.

As components are meant to be modular configurations, you can include groups of parameters in different components and use them as building blocks that you can include in other components:

warehouse_grants:
  dataops.extends:
    - component.warehouse_usage_grants
    - dataops.merge: component.warehouse_monitor_grants
    - dataops.merge: component.warehouse_operate_grants

warehouse_usage_grants:
  grants:
    USAGE:
      - rel(role.READER)

warehouse_monitor_grants:
  grants:
    MONITOR:
      - rel(role.ADMIN)

warehouse_operate_grants:
  grants:
    OPERATE:
      - rel(role.ADMIN)

Overriding values

The dataops.extends keyword supports both a string to include a single component and a list to include multiple components. If multiple components are included and two or more components define the same parameter, then the order of including components determines the final value.

The value is overridden as you go down the list of included components. This means that the value specified in the last included component will be used:

warehouse_x_small:
  warehouse_size: XSMALL
warehouse_small:
  warehouse_size: SMALL
warehouse_medium:
  warehouse_size: MEDIUM
warehouse_large:
  warehouse_size: LARGE
warehouse_2x_large:
  warehouse_size: X2LARGE

warehouse_properties:
  dataops.extends:
    - component.warehouse_small
    - component.warehouse_2x_large
    - component.warehouse_x_small
    - component.warehouse_medium
    - component.warehouse_large
  # warehouse_size: LARGE          The value of warehouse_size will be LARGE
  #                                as the last component included is  component.warehouse_large

warehouse_properties_override:
  dataops.extends:
    - component.warehouse_small
    - component.warehouse_2x_large
    - component.warehouse_x_small
    - component.warehouse_medium
    - component.warehouse_large
  warehouse_size: SMALL #          The value of warehouse_size will be SMALL
  #                                as the current component overrides it

info

The parameters defined in the object will always override the value.

Merging values

Scalar values such as string, integer, and boolean are always overridden when including a component. For list or map values, SOLE for Data Products supports merging in addition to override. Merging allows combining values from different sources into one final value.

By default, all values and components are overridden. To enable merge, you must use dataops.merge.

Merging components

To merge components, you must define the component to be included by using dataops.merge: component.<component_name>. Following is an example of merging component and the final values:

Component Definition
Merged Component Definition

dataops/snowflake/components/warehouse.yml
warehouse_default_usage_grants:
  grants:
    USAGE:
      - rel(role.WRITER)
      - rel(role.READER)
      - rel(role.ADMIN)
warehouse_default_monitor_grants:
  grants:
    MONITOR:
      - SYSADMIN
      - rel(role.ADMIN)
warehouse_default_grants:
  dataops.extends:
    - dataops.merge: component.warehouse_default_usage_grants
    - dataops.merge: component.warehouse_default_monitor_grants

dataops/snowflake/components/warehouse.yml
warehouse_default_usage_grants:
  grants:
    USAGE:
      - rel(role.WRITER)
      - rel(role.READER)
      - rel(role.ADMIN)
warehouse_default_monitor_grants:
  grants:
    MONITOR:
      - SYSADMIN
      - rel(role.ADMIN)
warehouse_default_grants:
  grants:
    USAGE:
      - rel(role.WRITER)
      - rel(role.READER)
      - rel(role.ADMIN)
    MONITOR:
      - SYSADMIN
      - rel(role.ADMIN)

Merging components into values

As previously mentioned, the parameters defined in an object definition always override the value specified in components.

If instead of the override default behavior, you want to merge the defined value with the component's value, add the parameter using the following format dataops.merge.<parameter>.
This will merge the values defined in the parameters with the values included in components recursively.

Component Definition
Merged Component Definition

dataops/snowflake/components/warehouse.yml
warehouse_default_grants:
  grants:
    USAGE:
      - rel(role.WRITER)
      - rel(role.READER)
      - rel(role.ADMIN)
    MONITOR:
      - SYSADMIN
      - rel(role.ADMIN)

dataops/snowflake/warehouses/data_science.yml
- warehouse:
    dataops.extends:
      - component.namespacing_none
      - component.warehouse_default_grants
    comment: Warehouse for the Data Science Team
    warehouse_size: SMALL
    dataops.merge.grants:
      USAGE:
        - rel(role.DATA_SCIENCE)
      MONITOR:
        - rel(role.DATA_SCIENCE)

dataops/snowflake/components/warehouse.yml
warehouse_default_grants:
  grants:
    USAGE:
      - rel(role.WRITER)
      - rel(role.READER)
      - rel(role.ADMIN)
    MONITOR:
      - SYSADMIN
      - rel(role.ADMIN)

dataops/snowflake/warehouses/data_science.yml
- warehouse:
    dataops.extends:
      - component.namespacing_none
      - component.warehouse_default_grants
    comment: Warehouse for the Data Science Team
    warehouse_size: SMALL
    grants:
      USAGE:
        - rel(role.WRITER)
        - rel(role.READER)
        - rel(role.ADMIN)
        - rel(role.DATA_SCIENCE)
      MONITOR:
        - SYSADMIN
        - rel(role.ADMIN)
        - rel(role.DATA_SCIENCE)

Jinja template in components

You can also use components to store Jinja logic as a named key. This allows Jinja to be easily used in objects and components without repeating it.

Component with Conditional Namespacing
Rendered Component in PROD

dataops/snowflake/components/share.template.yml
share_namespacing:
  {% if env.DATAOPS_ENV_NAME == 'PROD' %}
  namespacing: prefix
  {% else %}
  namespacing: both
  {% endif %}

dataops/snowflake/shares/ingestion.yml
- share:
    name: INGESTION
    dataops.extends:
      - component.share_namespacing
    comment: "Ingestion Share"

dataops/snowflake/components/share.template.yml
share_namespacing:
  namespacing:
    prefix # On PROD environment, the Jinja is resolved
    # to have namespacing set to 'prefix'

dataops/snowflake/shares/ingestion.yml
- share:
    name: INGESTION
    namespacing:
      prefix # When component is resolved,
      # the share will have namespacing set to 'prefix'
    comment: "Ingestion Share"

`rel` function

As part of SOLE for Data Products, we have also updated how the relationship between managed objects is defined. You can see this in the examples used till now.

SOLE for Data Products puts more focus on validation and preventing accidental configuration. You must define the relationship to managed objects using the rel() function to achieve this. The function takes a single parameter (string) as input and in the format <object_type>.<object_id>.

- warehouse:
    name: INGESTION
    comment: Warehouse for Ingestion operations
    resource_monitor: rel(resource_monitor.INGESTION) # Relation defined to managed Resource Monitor 'INGESTION'
    warehouse_size: LARGE
    grants:
      MONITOR:
        - SYSADMIN
        - rel(role.ADMIN)

- stage:
    comment: Load CSV Data
    name: CSV_STAGE
    schema: rel(schema.FIVETRAN_SCHEMA) # Relation defined to schema 'FIVETRAN_SCHEMA'

    database:
      rel(database.{{ env.DATAOPS_DATABASE }}) # Relation defined to database
      # '{{ env.DATAOPS_DATABASE }}', whose value is resolved at runtime

    file_format:
      rel(file_format.INGESTION_CSV_FORMAT) # Relation defined to managed File Format
      # '{{ env.DATAOPS_DATABASE }}'.'FIVETRAN_SCHEMA'.'INGESTION_CSV_FORMAT'

    url: s3://source-bucket/csv/
    CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'

Explicit relations

The rel() function makes the relation between managed entities more explicit. It reduces the chances of SOLE making assumptions about the implied configuration and instead focuses on the specified configuration.

The new structure also makes the configuration more readable when trying to determine what is and isn't managed by SOLE.
If SOLE does not manage an object, the object's name is a string not wrapped in rel().

- warehouse:
    name: INGESTION
    comment: Warehouse for Ingestion operations
    resource_monitor: rel(resource_monitor.INGESTION) # Relation defined to managed Resource Monitor 'INGESTION'
    warehouse_size: LARGE
    grants:
      MONITOR:
        - SYSADMIN # Role not managed by SOLE
        - rel(role.ADMIN) # Role managed by SOLE

Configuration validation

Using rel() to define relations allows SOLE to verify configuration more thoroughly before it is applied in Snowflake.

If an object is defined as a relation, but its config is missing — either due to partial commit or due to accidental delete — SOLE will throw an error early on the stage. This prevents accidentally dropping objects due to incorrect or partial configuration

Unique SOLE ID

For account-level objects, you can use the config name when adding a relation in rel(). For schemas or schema-level objects, you might have two or more objects with the same name but a different parent. If the config name is specified in rel(), it could lead to an error as SOLE might not be able to resolve the relationship with certainty.

For such cases, you can use the new parameter dataops.id to assign a unique ID for the object. This ID is used only to resolve the relationship and does not affect the object name in Snowflake.

- file_format:
    name: CSV_FORMAT
    dataops.id: INGESTION_CSV_FORMAT
    schema: rel(schema.INGESTION_SCHEMA)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    dataops.extends:
      - component.csv_file_format

- file_format:
    name: CSV_FORMAT
    schema: rel(schema.TRANSFORMATION_SCHEMA)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    dataops.extends:
      - component.csv_file_format

- stage:
    comment: Load CSV Data
    name: CSV_STAGE
    schema: rel(schema.FIVETRAN_SCHEMA)
    database: rel(database.{{ env.DATAOPS_DATABASE }})
    file_format: rel(file_format.INGESTION_CSV_FORMAT)

In the above example, two file formats with the same name CSV_FORMAT are present in different schemas. The stage FIVETRAN_SCHEMA.CSV_STAGE wants to use the file format present in INGESTION_SCHEMA. To do so, you assign a unique ID INGESTION_CSV_FORMAT to the file format using dataops.id. You can now use this file format by specifying the unique ID in rel().

Schema-level objects

If the specified object is a schema-level object, SOLE can resolve the parent's details if the object has a unique name or if you have defined a unique ID. This reduces the number of code lines that have to be added when referencing schema-level objects.

Following is an example of adding relation to file format from stages:

Adding File Format in Current Configuration
Adding File Format Using rel()

databases:
  DATABASE_1:
    schemas:
      SCHEMA_1:
        file_formats:
          FORMAT_1:
            format_type: CSV
        stages:
          STAGE_1:
            file_format:
              format_name: FORMAT_1
      SCHEMA_2:
        stages:
          STAGE_2:
            file_format:
              format_name: FORMAT_1
              format_schema: SCHEMA_1
  DATABASE_2:
    schemas:
      SCHEMA_3:
        stages:
          STAGE_3:
            file_format:
              format_name: FORMAT_1
              format_schema: SCHEMA_1
              format_database: DATABASE_1

- file_format:
    name: FORMAT_1
    schema: rel(schema.SCHEMA_1)
    database: rel(database.DATABASE_1)

- stage:
    name: STAGE_1
    schema: rel(schema.SCHEMA_1)
    database: rel(database.DATABASE_1)
    file_format: rel(file_format.FORMAT_1)

- stage:
    name: STAGE_2
    schema: rel(schema.SCHEMA_2)
    database: rel(database.DATABASE_1)
    file_format: rel(file_format.FORMAT_1)

- stage:
    name: STAGE_3
    schema: rel(schema.SCHEMA_3)
    database: rel(database.DATABASE_2)
    file_format: rel(file_format.FORMAT_1)

You can also use the rel() function as a parameter, such as when using tags.

Adding Tag in Current Configuration
Adding Tag Using rel()

databases:
  DATABASE_1:
    schemas:
      SCHEMA_1:
        tags:
          TAG_1:
            allowed_values: ["Red", "Yellow", "Blue"]
        tables:
          TABLE_1:
            with_tags:
              TAG_1:
                value: Red
            columns:
              COLUMN_1:
                type: NUMBER(38,0)
      SCHEMA_2:
        tables:
          TABLE_2:
            with_tags:
              TAG_1:
                value: Yellow
                schema: SCHEMA_1
            columns:
              COLUMN_1:
                type: NUMBER(38,0)
  DATABASE_2:
    schemas:
      SCHEMA_3:
        tables:
          TABLE_3:
            with_tags:
              TAG_1:
                value: Blue
                schema: SCHEMA_1
                database: DATABASE_1
            columns:
              COLUMN_1:
                type: NUMBER(38,0)

- tag:
    name: TAG_1
    allowed_values: ["Red", "Yellow", "Blue"]

- table:
    name: TABLE_1
    schema: rel(schema.SCHEMA_1)
    database: rel(database.DATABASE_1)
    with_tags:
      rel(tag.TAG_1): Red
    columns:
      COLUMN_1:
        type: NUMBER(38,0)

- table:
    name: TABLE_2
    schema: rel(schema.SCHEMA_2)
    database: rel(database.DATABASE_1)
    with_tags:
      rel(tag.TAG_1): Yellow
    columns:
      COLUMN_1:
        type: NUMBER(38,0)

- table:
    name: TABLE_1
    schema: rel(schema.SCHEMA_3)
    database: rel(database.DATABASE_2)
    with_tags:
      rel(tag.TAG_1): Blue
    columns:
      COLUMN_1:
        type: NUMBER(38,0)

Resolving relation

To resolve the relationship defined in rel(), SOLE performs a lookup in the following order:

Search for a defined relation in the list of unique IDs.
Search for a defined relation in the same schema as the current object.
Search for defined relations in all schemas in the same database as the current object.
Search for defined relations in all schemas in all the databases.

If a unique match is found, SOLE will use that to resolve rel(). If multiple matches are found, then SOLE throws an error. In such cases, it is recommended that you assign and define a unique ID using dataops.id.

Converting the DataOps template to use SOLE for Data Products

To use SOLE for Data Products with your existing DataOps projects, you must migrate/update your current Snowflake object configuration to match the new structure. Steps to migrate the DataOps Template Project have been added in SOLE Data Products Migration to provide an example of how you can convert current object configurations to use SOLE for Data Products.

Overview​

Why SOLE for Data Products?​

Enabling SOLE for Data Products​

Directory defaults​

Non-object configuration​

Config version​

Flat object definition​

Self-describing objects​

Break away from the nested structure​

File structure​

Components​

Modular configuration​

Using components​

Reusing components​

Overriding values​

Merging values​

Merging components​

Merging components into values​

Jinja template in components​

rel function​

Explicit relations​

Configuration validation​

Unique SOLE ID​

Schema-level objects​

Resolving relation​

Converting the DataOps template to use SOLE for Data Products​