DataOps Snowflake Object Lifecycle Engine
Overview
We have introduced SOLE for Data Products as a new framework for the Snowflake Object Lifecycle Engine (SOLE) to help you easily build an ecosystem of data products and benefit from greater accuracy and accessibility of data. The major difference is how you define Snowflake objects in the SOLE configuration. To enable and use this functionality, follow the steps detailed in Enabling SOLE for Data Products.
To address potential MATE job issues arising from the new default dbt version 1.5, see the dbt 1.5 Migration documentation.
There are two approaches for building data products with DataOps.live:
- DataOps.live Create: This approach automates data product building, management, and deployment.
- Data product implementation: This involves building each component from scratch, where you use pipelines to create, test, deploy, and update standalone or composite data products in your DataOps projects.
Why SOLE for Data Products?
Sole for Data Products answers two challenges:
- It simplifies the management of Snowflake object configuration in large infrastructures by treating data as a product.
- It allows the grouping of data products by domains to facilitate consuming data for anybody in an organization who needs data for a particular function. For example, a finance team needs its own set of objects like databases, roles, schemas, etc., while a marketing person may need product-related objects.
In this new version of SOLE, it is possible to relate object definitions to domains, where a domain can be a team. This lets you organize files in any way that better fits your organization.
The below table summarizes the differences between the current SOLE and SOLE for Data Products.
Features and Characteristics | Current SOLE | SOLE for Data Products | Benefits/Comment |
---|---|---|---|
Object definitions | Objects are grouped based on the object types and all configurations should be in the same file. If the object is schema-level, it must be in the same file as the database. | Object configurations are self-describing and they don't have to be in the same file under the parent's configuration | Break away from the nested structure and organize files into type-specific or domain-specific categories. Learn more about flat vs tree structure definitions. |
File structure | Organize all object-specific configurations in a single file following the hierarchical data tree structure | Organize object configurations in separate files and in any structure that meets your organization's needs | Support file structures that match how data products are set up in a domain. Learn more about file structure. |
Components (reuse object configuration) | Not applicable | Define code blocks of YAML configuration as part of any object definition and reuse them in any file | Reduce the duplicates required when configuring objects such as grants on a table or common Jinja templates. Learn more about components. |
Object relations | Not applicable | Define the relationship between managed objects by using the rel() function | Make the relation between managed entities more explicit. This reduces the chances of SOLE making assumptions about the implied configuration and focuses more on the specified configuration. Learn more about the rel function. |
Directory defaults | Not applicable | Use the project configuration file to define some default values that are applied to all objects within a directory and recursively within all sub-directories | Apply specific parameters for all objects present in a directory to avoid defining database name, schema name, and similar properties in all schema-level objects for example and having lengthy configurations. Learn more about directory defaults. |
Name collision and dataops.id | Not applicable | Assign a unique ID for an object and use it to resolve the relationship in case of conflicts | Resolve the relationship defined using the rel() function in case multiple matches are found for an object. Learn more about unique SOLE IDs. |
Enabling SOLE for Data Products
To enable SOLE for Data Products, add the file dataops_config.yml
to the snowflake configuration directory.
This switches SOLE from using the current configuration to SOLE for Data Products.
The current configuration structure and SOLE for Data Products are mutually exclusive. If SOLE for Data Products is enabled and configuration for the current structure is found, SOLE will throw an error.
The new configuration file dataops_config.yml
helps you to:
- Define defaults for directories.
- Specify folders that contain non-object configuration.
- Specify the config version.
Directory defaults
You can use the project configuration file to define defaults for a directory. This lets you apply some parameters for all objects present in a directory.
This is supported by the following elements:
- Object parent
database
schema
- SOLE keywords
namespacing
manage_mode
environment
The defaults can be overridden following the increasing order of precedence below. For example, the value in the object definition overrides that in the component:
- Default specified for subdirectory
- Value specified in the included component
- Value specified in the object definition
Non-object configuration
SOLE for Data Products supports specifying configurations other than object definition such as hooks, runtime-configuration, and the new functionality Components.
You must store these special configurations in special files or directories so that they are ignored while performing validation. Use the following keys to define the directory and files:
- Hooks:
hook-paths
- Configuration:
configuration-paths
- Component:
component-paths
Config version
Project configuration in SOLE for Data Products is planned to be further improved over time, with each major improvement added incrementally. To reinforce this evolution, the project configuration file adds support for a new parameter, config-version
.
Currently, this parameter only supports value 2
, but over time, new features would be added in the form of 2.1
, 2.2
, etc., which would allow a seamless transition to a new configuration with new functionalities.
Following is an example of dataops_config.yml with all the above-mentioned parameters:
config-version: 2
configuration-paths: [configuration]
component-paths: [components]
hook-paths: [hooks.yml]
ingestion_objects:
+database: rel(database.{{ env.DATAOPS_DATABASE }})
+schema: rel(schema.INGESTION_SCHEMA)
fivetran:
+schema: rel(schema.FIVETRAN_SCHEMA)
transformation_objects:
+database: rel(database.{{ env.DATAOPS_DATABASE }})
+schema: rel(schema.TRANSFORMATION_SCHEMA)
namespacing_prefix:
+namespacing: prefix
Flat object definition
The current version of SOLE requires Snowflake object configuration to be grouped based on the object type. This means if you have multiple roles, you must define all of them in a single YAML file under the roles key. This type of object definition becomes more prevalent when you define schemas or schema-level objects, as these have to be in a tree structure, and this adds a challenge for organizations that manage a large number of objects in SOLE.
To address such a challenge, we have made a few major changes to how you define objects. Rather than having a grouped collection of objects, SOLE for Data Products goes for modular, self-describing, and explicit object definition. All definitions specify the object type and relation to other objects, such as their parents.
Self-describing objects
Object configuration in SOLE for Data Products is self-describing that anyone looking at just the object definition can immediately tell the following:
- Object type
- Object name
- Object dependencies
- Object relations
Following are a few examples showing this behavior:
- warehouse: # Type of the object
dataops.extends:
- component.default_warehouse # Adding a 'component' with the name 'default_warehouse'
comment: Warehouse for Ingestion operations
name: DATAOPS_INGESTION # Name of the object
resource_monitor: rel(resource_monitor.INGESTION) # Adding a relation to managed Resource Monitor INGESTION
grants:
USAGE:
- rel(role.WRITER) # Adding a relation to managed Role WRITER
MONITOR:
- SYSADMIN # Non managed Role SYSADMIN
- rel(role.WRITER)
Break away from the nested structure
With SOLE's new structure for defining objects, child objects don't have to be in the same file under the parent's configuration. Each child object defines the details of the parent object as part of the configuration. This allows you to specify the configuration for hierarchical objects in entirely separate files.
The following configurations show you how to define roles and a database at the account level and then schema, table, and stage at the database level in separate files to match organization needs. In this example, objects are organized in object-specific directories, but you can use whatever structure suits you most.
dataops/snowflake/
dataops/snowflake/databases/
dataops/snowflake/schemas/
dataops/snowflake/tables/
dataops/snowflake/stages/
- role:
name: READER
environment: PROD
namespacing: prefix
roles:
- rel(role.WRITER)
- role:
name: WRITER
environment: PROD
namespacing: prefix
roles:
- rel(role.ADMIN)
- role:
name: ADMIN
environment: PROD
namespacing: prefix
- database:
name: "{{ env.DATAOPS_DATABASE }}"
{% if (env.DATAOPS_ENV_NAME != 'PROD' and env.DATAOPS_ENV_NAME != 'QA') %}
from_database: "{{ env.DATAOPS_DATABASE_MASTER }}" # If a database name and from_database values are different, then the new database would be cloned. Otherwise it would be created with a specified definition
data_retention_time_in_days: 1
{% endif %}
comment: This Database is used for demonstration of the usage of the Snowflake Object LifeCycle Runner
grants:
USAGE:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
modify:
- rel(role.WRITER)
- rel(role.ADMIN)
monitor:
- rel(role.ADMIN)
- SYSADMIN
CREATE SCHEMA:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
- schema:
dataops.extends:
- component.default_schema_grants
name: ADVENTUREWORKS_HUMANRESOURCES
database: rel(database.{{ env.DATAOPS_DATABASE }})
- table:
name: JOBCANDIDATE
schema: rel(schema.ADVENTUREWORKS_HUMANRESOURCES)
database: rel(database.{{ env.DATAOPS_DATABASE }})
comment: "JOBCANDIDATE Table"
dataops.extends: component.ingestion_table_grants
columns:
JOBCANDIDATEID:
type: NUMBER(38,0)
comment: "Primary Key of the job candidate table"
BUSINESSENTITYID:
type: NUMBER(38,0)
comment: "The Id of the business Entity"
RESUME:
type: VARCHAR(16777216)
comment: "Resume of the candidate"
MODIFIEDDATE:
type: TIMESTAMP_NTZ(9)
comment: "Timestamp when the record was modified"
- stage:
name: S3_STAGE
schema: rel(schema.ADVENTUREWORKS_HUMANRESOURCES)
database: rel(database.{{ env.DATAOPS_DATABASE }})
comment: Created to load the Adventureworks data
url: s3://dataops-demo/adventureworks/
CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
file_format: rel(file_format.ADVW_UTF8_TAB_PUB)
grants:
ALL PRIVILEGES:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
USAGE:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
File structure
SOLE for Data Products gives you the freedom to organize object definitions however you like, without any restriction, and in whatever structure that meets your organization's needs.
As the name suggests, SOLE for Data Products supports file structures that match how data products are set up in a domain. You can set up top-level directories for each data product in a domain — it is recommended to maintain a project per Domain. Objects belonging to the data product can then have their definition present in that directory. You can also organize data product directories with more subdirectories to represent an action, business logic, or object type.
Components
As many objects have a similar configuration, whether it is the grants on a table or file format properties, it makes sense to move such a configuration to a code block and reuse it. To answer this need, SOLE for Data Products introduces components that are pieces of YAML configuration you can use as part of any object definition.
Components are based on YAML anchors but don't have their limitations where the anchor and its usage must be in the same file. Components in SOLE for Data Products let you define object configuration in any file.
Modular configuration
Components are designed to serve as modular configurations you can use in any object definition — assuming that the object supports the specified parameters. A Component can be as small as a single parameter or all parameters of an object. You can then use components directly in an object definition or combine them with other components.
Following are a few examples of components:
namespacing_prefix:
namespacing: prefix
namespacing_none:
namespacing: none
ingestion_table_grants:
grants:
SELECT:
- rel(role.READER)
- rel(role.WRITER)
- rel(role.ADMIN)
INSERT:
- rel(role.WRITER)
- rel(role.ADMIN)
UPDATE:
- rel(role.WRITER)
- rel(role.ADMIN)
DELETE:
- rel(role.ADMIN)
default_warehouse:
grants:
USAGE:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
MONITOR:
- SYSADMIN
- rel(role.ADMIN)
warehouse_size: X-SMALL
auto_suspend: 40
auto_resume: true
namespacing: none
Object definitions are defined as list items, whereas component definitions are configured with a key. This provides a distinct difference between the two to make the configuration self-describing.
Using components
As components are not Snowflake object configurations, they have to be organized in a directory. You must define this directory in the project configuration file dataops_config.yml
with the key component-paths
.
To use components:
-
Add a new parameter
dataops.extends
as part of the object or component definition. This parameter accepts either a string for a single component or a list for multiple components. -
Use the format
component.<component_name>
to include the components.- warehouse:
dataops.extends:
- component.warehouse_properties
- component.warehouse_grants
name: TRANSFORMATION
comment: Warehouse for Transformation operations
- file_format:
name: CSV_FORMAT
schema: rel(schema.INGESTION_SCHEMA)
database: rel(database.{{ env.DATAOPS_DATABASE }})
dataops.extends: component.csv_file_format
Reusing components
In the previous section, you saw how components are included in an object definition.
As components are meant to be modular configurations, you can include groups of parameters in different components and use them as building blocks that you can include in other components:
warehouse_grants:
dataops.extends:
- component.warehouse_usage_grants
- dataops.merge: component.warehouse_monitor_grants
- dataops.merge: component.warehouse_operate_grants
warehouse_usage_grants:
grants:
USAGE:
- rel(role.READER)
warehouse_monitor_grants:
grants:
MONITOR:
- rel(role.ADMIN)
warehouse_operate_grants:
grants:
OPERATE:
- rel(role.ADMIN)
Overriding values
The dataops.extends
keyword supports both a string to include a single component and a list to include multiple components. If multiple components are included and two or more components define the same parameter, then the order of including components determines the final value.
The value is overridden as you go down the list of included components. This means that the value specified in the last included component will be used:
warehouse_x_small:
warehouse_size: XSMALL
warehouse_small:
warehouse_size: SMALL
warehouse_medium:
warehouse_size: MEDIUM
warehouse_large:
warehouse_size: LARGE
warehouse_2x_large:
warehouse_size: X2LARGE
warehouse_properties:
dataops.extends:
- component.warehouse_small
- component.warehouse_2x_large
- component.warehouse_x_small
- component.warehouse_medium
- component.warehouse_large
# warehouse_size: LARGE The value of warehouse_size will be LARGE
# as the last component included is component.warehouse_large
warehouse_properties_override:
dataops.extends:
- component.warehouse_small
- component.warehouse_2x_large
- component.warehouse_x_small
- component.warehouse_medium
- component.warehouse_large
warehouse_size: SMALL # The value of warehouse_size will be SMALL
# as the current component overrides it
The parameters defined in the object will always override the value.
Merging values
Scalar values such as string, integer, and boolean are always overridden when including a component. For list or map values, SOLE for Data Products supports merging in addition to override. Merging allows combining values from different sources into one final value.
By default, all values and components are overridden. To enable merge, you must use dataops.merge
.
Merging components
To merge components, you must define the component to be included by using dataops.merge: component.<component_name>
. Following is an example of merging component and the final values:
- Component Definition
- Merged Component Definition
warehouse_default_usage_grants:
grants:
USAGE:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
warehouse_default_monitor_grants:
grants:
MONITOR:
- SYSADMIN
- rel(role.ADMIN)
warehouse_default_grants:
dataops.extends:
- dataops.merge: component.warehouse_default_usage_grants
- dataops.merge: component.warehouse_default_monitor_grants
warehouse_default_usage_grants:
grants:
USAGE:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
warehouse_default_monitor_grants:
grants:
MONITOR:
- SYSADMIN
- rel(role.ADMIN)
warehouse_default_grants:
grants:
USAGE:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
MONITOR:
- SYSADMIN
- rel(role.ADMIN)
Merging components into values
As previously mentioned, the parameters defined in an object definition always override the value specified in components.
If instead of the override default behavior, you want to merge the defined value with the component's value, add the parameter using the following format dataops.merge.<parameter>
.
This will merge the values defined in the parameters with the values included in components recursively.
- Component Definition
- Merged Component Definition
warehouse_default_grants:
grants:
USAGE:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
MONITOR:
- SYSADMIN
- rel(role.ADMIN)
- warehouse:
dataops.extends:
- component.namespacing_none
- component.warehouse_default_grants
comment: Warehouse for the Data Science Team
warehouse_size: SMALL
dataops.merge.grants:
USAGE:
- rel(role.DATA_SCIENCE)
MONITOR:
- rel(role.DATA_SCIENCE)
warehouse_default_grants:
grants:
USAGE:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
MONITOR:
- SYSADMIN
- rel(role.ADMIN)
- warehouse:
dataops.extends:
- component.namespacing_none
- component.warehouse_default_grants
comment: Warehouse for the Data Science Team
warehouse_size: SMALL
grants:
USAGE:
- rel(role.WRITER)
- rel(role.READER)
- rel(role.ADMIN)
- rel(role.DATA_SCIENCE)
MONITOR:
- SYSADMIN
- rel(role.ADMIN)
- rel(role.DATA_SCIENCE)
Jinja template in components
You can also use components to store Jinja logic as a named key. This allows Jinja to be easily used in objects and components without repeating it.
- Component with Conditional Namespacing
- Rendered Component in PROD
share_namespacing:
{% if env.DATAOPS_ENV_NAME == 'PROD' %}
namespacing: prefix
{% else %}
namespacing: both
{% endif %}
- share:
name: INGESTION
dataops.extends:
- component.share_namespacing
comment: "Ingestion Share"
share_namespacing:
namespacing:
prefix # On PROD environment, the Jinja is resolved
# to have namespacing set to 'prefix'
- share:
name: INGESTION
namespacing:
prefix # When component is resolved,
# the share will have namespacing set to 'prefix'
comment: "Ingestion Share"
rel
function
As part of SOLE for Data Products, we have also updated how the relationship between managed objects is defined. You can see this in the examples used till now.
SOLE for Data Products puts more focus on validation and preventing accidental configuration. To achieve this, you must define the relationship to managed objects by using the rel()
function.
There are two ways to define rel functionality
-
The function takes a single parameter (string) as input and in the format
<object_type>.<object_id>
. -
The function takes a multi parameter (string) as input and in the format
<database>.<database_id>,<schema>.<schema_id>,<object_type>.<object_id>
.
rel with single parameter
- warehouse:
name: INGESTION
comment: Warehouse for Ingestion operations
resource_monitor: rel(resource_monitor.INGESTION) # Relation defined to managed Resource Monitor 'INGESTION'
warehouse_size: LARGE
grants:
MONITOR:
- SYSADMIN
- rel(role.ADMIN)
- stage:
comment: Load CSV Data
name: CSV_STAGE
schema: rel(schema.FIVETRAN_SCHEMA) # Relation defined to schema 'FIVETRAN_SCHEMA'
database:
rel(database.{{ env.DATAOPS_DATABASE }}) # Relation defined to database
# '{{ env.DATAOPS_DATABASE }}', whose value is resolved at runtime
file_format:
rel(file_format.INGESTION_CSV_FORMAT) # Relation defined to managed File Format
# '{{ env.DATAOPS_DATABASE }}'.'FIVETRAN_SCHEMA'.'INGESTION_CSV_FORMAT'
url: s3://source-bucket/csv/
CREDENTIALS: AWS_KEY_ID = '{{ AWS.INGESTION.ACCESS_KEY_ID }}' AWS_SECRET_KEY = '{{ AWS.INGESTION.SECRET_ACCESS_KEY }}'
rel with Multi parameter
- function:
name: SQL_FUNCTION
database: rel(database.SALES_RECORD)
schema: rel(schema.SALES_SCHEMA)
comment: "Function using SQL language"
language: sql
return_type: NUMBER(38,0)
return_behavior: IMMUTABLE
null_input_behavior: "RETURNS NULL ON NULL INPUT"
statement: "select count(*) from rel(table.PRODUCT,schema.TEST_SCHEMA)" # table is present in different schema
arguments:
A:
type: NUMBER
B:
type: NUMBER
- function:
name: SQL_FUNCTION
database: rel(database.SALES_RECORD)
schema: rel(schema.SALES_SCHEMA)
comment: "Function using SQL language"
language: sql
return_type: NUMBER(38,0)
return_behavior: IMMUTABLE
null_input_behavior: "RETURNS NULL ON NULL INPUT"
statement: "select count(*) from rel(database.DATABASE_1,schema.TEST_SCHEMA,table.PRODUCT)" # table is present in different database and schema
arguments:
A:
type: NUMBER
B:
type: NUMBER
- procedure:
name: TEST_PROCEDURE
database: rel(database.SALES_RECORD)
schema: rel(schema.SALES_SCHEMA)
return_type: VARCHAR(16777216)
statement: >-
var rs = snowflake.execute( { sqlText: # table is present in different schema
`INSERT INTO rel(table.table1,schema.TEST_SCHEMA)("column1")
SELECT 'value 1' AS "column 1" ;`
} );
return 'Done.';
arguments: {}
execute_as: OWNER
language: javascript
manage_mode: all
- procedure:
name: TEST_PROCEDURE
database: rel(database.SALES_RECORD)
schema: rel(schema.SALES_SCHEMA)
return_type: VARCHAR(16777216)
statement: >-
var rs = snowflake.execute( { sqlText: # table is present in different database and schema
`INSERT INTO rel(database.DATABASE_1,schema.TEST_SCHEMA,table.table1)("column1")
SELECT 'value 1' AS "column 1" ;`
} );
return 'Done.';
arguments: {}
execute_as: OWNER
language: javascript
manage_mode: all
Explicit relations
The rel()
function makes the relation between managed entities more explicit. It reduces the chances of SOLE making assumptions about the implied configuration and instead focuses on the specified configuration.
The new structure also makes the configuration more readable when trying to determine what is and isn't managed by SOLE.
If SOLE does not manage an object, the object's name is a string not wrapped in rel()
.
- warehouse:
name: INGESTION
comment: Warehouse for Ingestion operations
resource_monitor: rel(resource_monitor.INGESTION) # Relation defined to managed Resource Monitor 'INGESTION'
warehouse_size: LARGE
grants:
MONITOR:
- SYSADMIN # Role not managed by SOLE
- rel(role.ADMIN) # Role managed by SOLE
Configuration validation
Using rel()
to define relations allows SOLE to verify configuration more thoroughly before it is applied in Snowflake.
If an object is defined as a relation, but its config is missing — either due to partial commit or due to accidental delete — SOLE will throw an error early on the stage. This prevents accidentally dropping objects due to incorrect or partial configuration
Unique SOLE ID
For account-level objects, you can use the config name when adding a relation in rel()
. For schemas or schema-level objects, you might have two or more objects with the same name but a different parent. If the config name is specified in rel()
, it could lead to an error as SOLE might not be able to resolve the relationship with certainty.
For such cases, you can use the new parameter dataops.id
to assign a unique ID for the object. This ID is used only to resolve the relationship and does not affect the object name in Snowflake.
- file_format:
name: CSV_FORMAT
dataops.id: INGESTION_CSV_FORMAT
schema: rel(schema.INGESTION_SCHEMA)
database: rel(database.{{ env.DATAOPS_DATABASE }})
dataops.extends:
- component.csv_file_format
- file_format:
name: CSV_FORMAT
schema: rel(schema.TRANSFORMATION_SCHEMA)
database: rel(database.{{ env.DATAOPS_DATABASE }})
dataops.extends:
- component.csv_file_format
- stage:
comment: Load CSV Data
name: CSV_STAGE
schema: rel(schema.FIVETRAN_SCHEMA)
database: rel(database.{{ env.DATAOPS_DATABASE }})
file_format: rel(file_format.INGESTION_CSV_FORMAT)
In the above example, two file formats with the same name CSV_FORMAT
are present in different schemas. The stage FIVETRAN_SCHEMA.CSV_STAGE
wants to use the file format present in INGESTION_SCHEMA
. To do so, you assign a unique ID INGESTION_CSV_FORMAT
to the file format using dataops.id
. You can now use this file format by specifying the unique ID in rel()
.
Schema-level objects
If the specified object is a schema-level object, SOLE can resolve the parent's details if the object has a unique name or if you have defined a unique ID. This reduces the number of code lines that have to be added when referencing schema-level objects.
Following is an example of adding relation to file format from stages:
- Adding File Format in Current Configuration
- Adding File Format Using rel()
databases:
DATABASE_1:
schemas:
SCHEMA_1:
file_formats:
FORMAT_1:
format_type: CSV
stages:
STAGE_1:
file_format:
format_name: FORMAT_1
SCHEMA_2:
stages:
STAGE_2:
file_format:
format_name: FORMAT_1
format_schema: SCHEMA_1
DATABASE_2:
schemas:
SCHEMA_3:
stages:
STAGE_3:
file_format:
format_name: FORMAT_1
format_schema: SCHEMA_1
format_database: DATABASE_1
- file_format:
name: FORMAT_1
schema: rel(schema.SCHEMA_1)
database: rel(database.DATABASE_1)
- stage:
name: STAGE_1
schema: rel(schema.SCHEMA_1)
database: rel(database.DATABASE_1)
file_format: rel(file_format.FORMAT_1)
- stage:
name: STAGE_2
schema: rel(schema.SCHEMA_2)
database: rel(database.DATABASE_1)
file_format: rel(file_format.FORMAT_1)
- stage:
name: STAGE_3
schema: rel(schema.SCHEMA_3)
database: rel(database.DATABASE_2)
file_format: rel(file_format.FORMAT_1)
You can also use the rel()
function as a parameter, such as when using tags.
- Adding Tag in Current Configuration
- Adding Tag Using rel()
databases:
DATABASE_1:
schemas:
SCHEMA_1:
tags:
TAG_1:
allowed_values: ["Red", "Yellow", "Blue"]
tables:
TABLE_1:
with_tags:
TAG_1:
value: Red
columns:
COLUMN_1:
type: NUMBER(38,0)
SCHEMA_2:
tables:
TABLE_2:
with_tags:
TAG_1:
value: Yellow
schema: SCHEMA_1
columns:
COLUMN_1:
type: NUMBER(38,0)
DATABASE_2:
schemas:
SCHEMA_3:
tables:
TABLE_3:
with_tags:
TAG_1:
value: Blue
schema: SCHEMA_1
database: DATABASE_1
columns:
COLUMN_1:
type: NUMBER(38,0)
- tag:
name: TAG_1
allowed_values: ["Red", "Yellow", "Blue"]
- table:
name: TABLE_1
schema: rel(schema.SCHEMA_1)
database: rel(database.DATABASE_1)
with_tags:
rel(tag.TAG_1): Red
columns:
COLUMN_1:
type: NUMBER(38,0)
- table:
name: TABLE_2
schema: rel(schema.SCHEMA_2)
database: rel(database.DATABASE_1)
with_tags:
rel(tag.TAG_1): Yellow
columns:
COLUMN_1:
type: NUMBER(38,0)
- table:
name: TABLE_1
schema: rel(schema.SCHEMA_3)
database: rel(database.DATABASE_2)
with_tags:
rel(tag.TAG_1): Blue
columns:
COLUMN_1:
type: NUMBER(38,0)
Resolving relation
To resolve the relationship defined in rel()
, SOLE performs a lookup in the following order:
- Search for a defined relation in the list of unique IDs.
- Search for a defined relation in the same schema as the current object.
- Search for defined relations in all schemas in the same database as the current object.
- Search for defined relations in all schemas in all the databases.
If a unique match is found, SOLE will use that to resolve rel()
. If multiple matches are found, then SOLE throws an error. In such cases, it is recommended that you assign and define a unique ID using dataops.id
.
Converting the DataOps template to use SOLE for Data Products
To use SOLE for Data Products with your existing DataOps projects, you must migrate/update your current Snowflake object configuration to match the new structure. Steps to migrate the DataOps Template Project have been added in SOLE Data Products Migration to provide an example of how you can convert current object configurations to use SOLE for Data Products.