SOLE Generator

The SOLE Generator is a CLI (Command-line Interface) tool that helps you generate a YAML configuration for SOLE objects from existing DDL statements. This tool is handy when you want to start managing your existing Snowflake databases through the data product platform. Instead of manually creating your database.template.yml file, you can do that using this tool.

dataops solegen concepts !!shadow!!

Below is a list of all the objects that are currently supported:

Prerequisites

First, install the CLI as part of the DataOps CLI package.

Before using the tool, you need to have the Snowflake DDL statements for the objects for which you want to generate the configuration. To get that, you can run the GET_DDL command in your Snowflake account. For example:

select get_ddl('DATABASE', 'SALES_RECORD', true);

The output of the GET_DDL command is the actual SQL CREATE statement of the specific SOLE object, including its children's objects. In the example above, the output will include the CREATE statements for all objects that belong to the SALES_RECORD database (schemas, tables, procedures and so on).

tip

For more accurate parsing, enable the use_fully_qualified_names_for_recreated_objects property of the GET_DDL command.

Usage

The SOLE Generator comes as part of the DataOps CLI tool. If you haven't installed it already, you can do so by following the steps explained in DataOps CLI.

The available options for the SOLE Generator CLI are listed in the image below:

dataops-solegen-help !!shadow!!

If you are using SOLE, you can check them out at any time by running the following command in your terminal window:

dataops sole gen --help

To do the same for SOLE for Data Products, use:

dataops sole-dp gen --help

Upcoming Command Deprecation

The dataops solegen command is scheduled to be deprecated shortly. Kindly transition to using dataops sole gen as the alternative.

Once you have the toolset, you can start generating the YAML configuration for your SOLE objects.

Generating SOLE configuration from a raw DDL statement

The SOLE Generator lets you generate SOLE configuration from raw SQL CREATE queries. You can do that by passing the SQL query as an argument either to the dataops sole gen or dataops sole-dp gen command, depending on the version of SOLE that you are using. For example:

Default Configuration
Data Products Configuration

dataops sole gen "create schema SALES_RECORD.SALES comment='This is a test DataOps.live schema.';"

dataops sole-dp gen "create schema SALES_RECORD.SALES comment='This is a test DataOps.live schema.';"

The YAML configuration generated by the SOLE Generator for the example above would be:

Default Configuration
Data Products Configuration

databases:
  SALES_RECORD:
    schemas:
      SALES:
        comment: This is a test DataOps.live schema.
        manage_mode: all

- schema:
    name: SALES
    database: rel(database.SALES_RECORD)
    comment: This is a test DataOps.live schema.
    manage_mode: all

Generating SOLE configuration from DDL files

Parsing raw DDL statements is useful when you want to generate the configuration for a single object. But, if you want to parse multiple objects simultaneously, you can pass a DDL file, or even a whole directory to the SOLE Generator.

Generate configuration from a single DDL file

To parse a single DDL file, use:

Default Configuration
Data Products Configuration

dataops sole gen /path/to/file/database.sql

dataops sole-dp gen /path/to/file/database.sql

If the content of the DDL file for your database is:

database.sql
create or replace database SALES_RECORD DATA_RETENTION_TIME_IN_DAYS=5;

create or replace schema SALES_RECORD.SALES;

create or replace TABLE SALES_RECORD.SALES.CUSTOMER (
	CUSTOMERID NUMBER(38,0) NOT NULL,
	ACCOUNTNUMBER VARCHAR(16777216),
	FULLNAME VARCHAR(16777216),
	MODIFIEDDATE TIMESTAMP_NTZ(9) NOT NULL
);

create or replace TABLE SALES_RECORD.SALES.PRODUCT (
	PRODUCTID NUMBER(38,0) NOT NULL,
	DESCRIPTION VARCHAR(100),
	UNITPRICE NUMBER(24,2) NOT NULL,
	MODIFIEDDATE TIMESTAMP_NTZ(9) NOT NULL
);

CREATE TASK SALES_RECORD.SALES.mytask1 AFTER db2.schema2.mytask2 as SELECT CURRENT_TIMESTAMP;
CREATE OR REPLACE FILE FORMAT SALES_RECORD.SALES.my_csv_format
  TYPE = CSV
  FIELD_DELIMITER = '|'
  SKIP_HEADER = 1
  NULL_IF = ('NULL', 'null')
  EMPTY_FIELD_AS_NULL = true
  COMPRESSION = gzip;
create or replace tag SALES_RECORD.SALES.DATA_DOMAIN_TAG ALLOWED_VALUES 'Sensitive', 'Highly Sensitive' comment= 'data domain tag';
create or replace sequence SALES_RECORD.SALES.SEQUENCE_1 increment by 1 comment= 'Object sequence';
create or replace STREAM SALES_RECORD.SALES.STREAM_1 on table DATABASE_3.SCHEMA_3.TABLE_2 append_only = false comment = 'Object stream' insert_only = false show_initial_rows = false;

create or replace row access policy SALES_RECORD.SALES.rap_sales_manager_regions_1 as (sales_region varchar) returns boolean ->
  'sales_executive_role' = current_role()
      or exists (
            select 1 from salesmanagerregions
              where sales_manager = current_role()
                and region = sales_region
          )
;

Its corresponding YAML configuration generated by the tool would be:

Default Configuration
Data Products Configuration

databases:
  SALES_RECORD:
    data_retention_time_in_days: "5"
    schemas:
      SALES:
        tables:
          CUSTOMER:
            columns:
              CUSTOMERID:
                type: NUMBER(38,0)
                nullable: false
              ACCOUNTNUMBER:
                type: VARCHAR(16777216)
                nullable: true
              FULLNAME:
                type: VARCHAR(16777216)
                nullable: true
              MODIFIEDDATE:
                type: TIMESTAMP_NTZ(9)
                nullable: false
            change_tracking: false
            manage_mode: all
          PRODUCT:
            columns:
              PRODUCTID:
                type: NUMBER(38,0)
                nullable: false
              DESCRIPTION:
                type: VARCHAR(100)
                nullable: true
              UNITPRICE:
                type: NUMBER(24,2)
                nullable: false
              MODIFIEDDATE:
                type: TIMESTAMP_NTZ(9)
                nullable: false
            change_tracking: false
            manage_mode: all
        tasks:
          mytask1:
            sql_statement: SELECT CURRENT_TIMESTAMP;
            after:
              database: db2
              schema: schema2
              name: mytask2
        file_formats:
          my_csv_format:
            format_type: CSV
            manage_mode: all
            compression: gzip
            empty_field_as_null: true
            field_delimiter: "|"
            null_if:
              - "NULL"
              - "null"
            skip_header: 1
        tags:
          DATA_DOMAIN_TAG:
            comment: data domain tag
            allowed_values:
              - Sensitive
              - Highly Sensitive
        sequences:
          SEQUENCE_1:
            comment: Object sequence
            increment: 1
        streams:
          STREAM_1:
            comment: Object stream
            append_only: false
            insert_only: false
            show_initial_rows: false
            on_table:
              database: DATABASE_3
              schema: SCHEMA_3
              name: TABLE_2
            manage_mode: all
        row_access_policies:
          rap_sales_manager_regions_1:
            signature:
              sales_region: varchar
            row_access_expression: |-
              'sales_executive_role' = current_role()
                  OR EXISTS (
                      SELECT 1
                        FROM salesmanagerregions
                       WHERE sales_manager = current_role()
                         AND region = sales_region
                     )
            manage_mode: all
        is_managed: false
        is_transient: false
        manage_mode: all
    namespacing: both
    manage_mode: all

- database:
    name: SALES_RECORD
    data_retention_time_in_days: 5
    namespacing: both
    manage_mode: all
- schema:
    name: SALES
    database: rel(database.SALES_RECORD)
    is_managed: false
    is_transient: false
    manage_mode: all
- table:
    name: CUSTOMER
    database: rel(database.SALES_RECORD)
    schema: rel(schema.SALES)
    columns:
      CUSTOMERID:
        type: NUMBER(38,0)
        nullable: false
      ACCOUNTNUMBER:
        type: VARCHAR(16777216)
        nullable: true
      FULLNAME:
        type: VARCHAR(16777216)
        nullable: true
      MODIFIEDDATE:
        type: TIMESTAMP_NTZ(9)
        nullable: false
    change_tracking: false
    manage_mode: all
- table:
    name: PRODUCT
    database: rel(database.SALES_RECORD)
    schema: rel(schema.SALES)
    columns:
      PRODUCTID:
        type: NUMBER(38,0)
        nullable: false
      DESCRIPTION:
        type: VARCHAR(100)
        nullable: true
      UNITPRICE:
        type: NUMBER(24,2)
        nullable: false
      MODIFIEDDATE:
        type: TIMESTAMP_NTZ(9)
        nullable: false
    change_tracking: false
    manage_mode: all
- task:
    name: mytask1
    database: rel(database.SALES_RECORD)
    schema: rel(schema.SALES)
    sql_statement: SELECT CURRENT_TIMESTAMP
    after:
      database: db2
      schema: schema2
      name: mytask2
    manage_mode: all
- file_format:
    name: my_csv_format
    database: rel(database.SALES_RECORD)
    schema: rel(schema.SALES)
    format_type: CSV
    manage_mode: all
    compression: gzip
    empty_field_as_null: true
    field_delimiter: "|"
    null_if:
      - "NULL"
      - "null"
    skip_header: 1
- tag:
    name: DATA_DOMAIN_TAG
    database: rel(database.SALES_RECORD)
    schema: rel(schema.SALES)
    comment: data domain tag
    allowed_values:
      - Sensitive
      - Highly Sensitive
    manage_mode: all
- sequence:
    name: SEQUENCE_1
    database: rel(database.SALES_RECORD)
    schema: rel(schema.SALES)
    comment: Object sequence
    increment: 1
    manage_mode: all
- stream:
    name: STREAM_1
    database: rel(database.SALES_RECORD)
    schema: rel(schema.SALES)
    comment: Object stream
    append_only: false
    insert_only: false
    show_initial_rows: false
    on_table:
      database: DATABASE_3
      schema: SCHEMA_3
      table: TABLE_2
    manage_mode: all
- row_access_policy:
    name: rap_sales_manager_regions_1
    database: rel(database.SALES_RECORD)
    schema: rel(schema.SALES)
    signature:
      sales_region: varchar
    row_access_expression: |-
      'sales_executive_role' = current_role()
          OR EXISTS (
              SELECT 1
                FROM salesmanagerregions
               WHERE sales_manager = current_role()
                 AND region = sales_region
             )
    manage_mode: all

Generate configuration from a directory

To parse a whole directory containing multiple DDL files, run:

Default Configuration
Data Products Configuration

dataops sole gen -d /path/to/directory

dataops sole-dp gen -d /path/to/directory

This command merges the SOLE configuration of all DDL files from the input directory into a single output.

If the input directory contains subdirectories, the files from those subdirectories will be parsed too.

Additionally, you can:

Use the -e option to specify the file format that needs parsing. To filter *.sql files only, run:
- Default Configuration
- Data Products Configuration
dataops sole gen -d /path/to/directory -e .sql
dataops sole-dp gen -d /path/to/directory -e .sql

Use the -s option to inherit the schema name from the subdirectory if the schema isn't defined.

If a SQL file is located in a subdirectory and the object name does not include the schema, then the subdirectory name would be used as the schema name.

If a SQL file is located in a subdirectory and the object name already includes the schema, then the --inherit-schema (-s) flag is ignored.

Default Configuration
Data Products Configuration

dataops sole gen -d /path/to/directory -s

dataops sole gen-dp -d /path/to/directory -s

For example, if we have a folder structure as follows:

Snowflake
|- TABLES
|  |- SCHEMA1
|  |  |- TABLE1.sql
|  |  |- TABLE2.sql
|  |- SCHEMA2
|  |  |- TABLE3.sql
|  |  |- TABLE4.sql

You can run the SOLE Generator with the -s option as follows:

Default Configuration
Data Products Configuration

dataops sole gen -d /path/to/directory -s > tables.yaml

dataops sole-dp gen -d /path/to/directory -s > tables.yaml

The resulting tables.yaml file will contain table definitions that specify the schema based on the subdirectory name:

Default Configuration
Data Products Configuration

databases:
  "{{ env.DATAOPS_DATABASE }}":
    schemas:
      schema_1:
        tables:
          TABLE1:
            # contents of TABLE1.sql
          TABLE2:
            # contents of TABLE2.sql
      schema_2:
        tables:
          TABLE3:
            # contents of TABLE3.sql
          TABLE4:
            # contents of TABLE4.sql

- table:
  name: TABLE1
  database: rel(database.{{ env.DATAOPS_DATABASE }})
  schema: rel(schema.schema_1)
  # contents of TABLE1.sql
- table:
  name: TABLE2
  database: rel(database.{{ env.DATAOPS_DATABASE }})
  schema: rel(schema.schema_1)
  # contents of TABLE2.sql
- table:
  name: TABLE3
  database: rel(database.{{ env.DATAOPS_DATABASE }})
  schema: rel(schema.schema_2)
  # contents of TABLE3.sql
- table:
  name: TABLE4
  database: rel(database.{{ env.DATAOPS_DATABASE }})
  schema: rel(schema.schema_2)
  # contents of TABLE4.sql

Using templates to define default parameter values

The SOLE Generator generates YAML configuration files from DDL statements that are subsequently run in SOLE to manage existing Snowflake databases through the data product platform. Sometimes DDLs don't contain all the needed parameters. In such a case, default values are used, but users need to change them manually in all the generated YAML files if they want to use different values.

Using templates with SOLE Generator simplifies customizing parameter values not included initially in DDLs while generating the configuration YAML files for the database objects. You can create templates containing your desired DataOps parameter values and store them on your local machines. Using the template with the dataops sole gen command will merge the template with the incoming DDLs and use the parameters specified in the template files as defaults when generating the YAML files.

Create a template

To create a template, you must create a YAML/JSON file or Python dictionary specifying the parameters and their default values. For example, if you want to set the manage_mode of a schema to none, you can either:

Create a YAML File
Create a JSON File
Use a Python Dictionary

mytemplate.yaml
databases:
  MYDATABASE:
    schemas:
      MYSCHEMA:
        manage_mode: none

mytemplate.json
{
  "databases": {
    "MYDATABASE": {
      "schemas": {
        "MYSCHEMA": {
          "manage_mode": "none"
        }
      }
    }
  }
}

{
    "databases": {
        "MYDATABASE": {
            "schemas": {
                "MYSCHEMA": {
                    "manage_mode": "none"
                }
            }
        }
    }
}

Each template above represents a possible way of setting the manage_mode of the MYSCHEMA schema to none.

See Use wildcards in templates for information about applying the same configuration to all schemas.

Use a template

Once you have created a template, you can use it with the dataops sole gen command by specifying the -t or --template option followed by the path to the template file.

For example, to use the templates mentioned above, you can run either of the following commands depending on the format of your template:

Use a YAML Template File
Use a JSON Template File
Use a Python Dictionary Template

dataops sole gen -t /path/to/mytemplate.yaml "CREATE SCHEMA MYDATABASE.MYSCHEMA"

dataops sole gen -t /path/to/mytemplate.json "CREATE SCHEMA MYDATABASE.MYSCHEMA"

dataops sole gen -t '{"databases": {"MYDATABASE": {"schemas": {"MYSCHEMA": {"manage_mode": "none"}}}}}' "CREATE SCHEMA MYDATABASE.MYSCHEMA"

Any of the above commands will generate the YAML output, including the parameters specified in the template. For example:

databases:
  MYDATABASE:
    schemas:
      MYSCHEMA:
        ...
        manage_mode: none

Use wildcards in templates

You can also use wildcards in templates to specify default values for multiple objects of the same type. For example, to set the manage_mode of all schemas to none, you can create a template choosing one of the following formats:

YAML Template
JSON Template
Python Dictionary Template

mytemplate.yaml
databases:
  MYDATABASE:
    schemas:
      "*":
        manage_mode: none

mytemplate.json
{
  "databases": {
    "MYDATABASE": {
      "schemas": {
        "*": {
          "manage_mode": "none"
        }
      }
    }
  }
}

{
    "databases": {
        "MYDATABASE": {
            "schemas": {
                "*": {
                    "manage_mode": "none"
                }
            }
        }
    }
}

Each template above represents a possible way of setting the manage_mode of all schemas to none using the wildcard * in the name field. You can then use any of these templates with the dataops sole gen command to generate configuration files for all schemas.

To set the manage_mode of all schemas and tables to none, you can create a template choosing one of the following formats:

YAML Template
JSON Template
Python Dictionary Template

mytemplate.yaml
databases:
  MYDATABASE:
    schemas:
      "*":
        manage_mode: none
        tables:
          "*":
            manage_mode: none

mytemplate.json
{
  "databases": {
    "MYDATABASE": {
      "schemas": {
        "*": {
          "manage_mode": "none",
          "tables": {
            "*": {
              "manage_mode": "none"
            }
          }
        }
      }
    }
  }
}

{
    "databases": {
        "MYDATABASE": {
            "schemas": {
                "*": {
                    "manage_mode": "none",
                    "tables": {
                        "*": {
                            "manage_mode": "none"
                        }
                    }
                }
            }
        }
    }
}

To set default values for various configuration parameters related to the SALES_RECORD database, its SALES schema, and its tables, you can create a template choosing one of the following formats:

YAML Template
JSON Template
Python Dictionary Template

template.yaml
databases:
  SALES_RECORD:
    data_retention_time_in_days: "6"
    comment: database comment
    namespacing: prefix
    manage_mode: none
    schemas:
      SALES:
        tables:
          "*":
            comment: table comment
            change_tracking: true
            manage_mode: none
            grants:
              USAGE:
                - ROLE_1
                - SYSADMIN:
                    with_grant_option: true
              MODIFY:
                - ROLE_1
                - ROLE_2
        manage_mode: none
        comment: schema comment

template.json
{
  "databases": {
    "SALES_RECORD": {
      "data_retention_time_in_days": "6",
      "comment": "database comment",
      "namespacing": "prefix",
      "manage_mode": "none",
      "schemas": {
        "SALES": {
          "tables": {
            "*": {
              "comment": "table comment",
              "change_tracking": "true",
              "manage_mode": "none",
              "grants": {
                "USAGE": [
                  "ROLE_1",
                  { "SYSADMIN": { "with_grant_option": "true" } }
                ],
                "MODIFY": ["ROLE_1", "ROLE_2"]
              }
            }
          },
          "manage_mode": "none",
          "comment": "schema comment"
        }
      }
    }
  }
}

{
    "databases": {
        "SALES_RECORD": {
            "data_retention_time_in_days": "6",
            "comment": "database comment",
            "namespacing": "prefix",
            "manage_mode": "none",
            "schemas": {
                "SALES": {
                    "tables": {
                        "*": {
                            "comment": "table comment",
                            "change_tracking": "true",
                            "manage_mode": "none",
                            "grants": {
                                "USAGE": [
                                    "ROLE_1",
                                    {"SYSADMIN": {"with_grant_option": "true"}}
                                ],
                                "MODIFY": ["ROLE_1", "ROLE_2"]
                            }
                        }
                    },
                    "manage_mode": "none",
                    "comment": "schema comment"
                }
            }
        }
    }
}

To use any of the above templates, you can run either of the following commands depending on the format of your template:

YAML Template
JSON Template
Python Dictionary Template

dataops sole gen -t /path/to/template.yaml /path/to/file/database.sql

dataops sole gen -t /path/to/template.json /path/to/file/database.sql

dataops sole gen -t '{
    "databases": {
        "SALES_RECORD": {
            "data_retention_time_in_days": "6",
            "comment": "database comment",
            "namespacing": "prefix",
            "manage_mode": "none",
            "schemas": {
                "SALES": {
                    "tables": {
                        "*": {
                            "comment": "table comment",
                            "change_tracking": "true",
                            "manage_mode": "none",
                            "grants": {
                                "USAGE": [
                                    "ROLE_1",
                                    {"SYSADMIN": {"with_grant_option": "true"}}
                                ],
                                "MODIFY": ["ROLE_1", "ROLE_2"]
                            }
                        }
                    },
                    "manage_mode": "none",
                    "comment": "schema comment"
                }
            }
        }
    }
}' /path/to/file/database.sql

Any of the above commands will generate the YAML output, including the parameters specified in the template. For example:

databases:
  SALES_RECORD:
    data_retention_time_in_days: "6"
    schemas:
      SALES:
        tables:
          CUSTOMER:
            columns:
              CUSTOMERID:
                type: NUMBER(38,0)
                nullable: false
              ACCOUNTNUMBER:
                type: VARCHAR(16777216)
                nullable: true
              FULLNAME:
                type: VARCHAR(16777216)
                nullable: true
              MODIFIEDDATE:
                type: TIMESTAMP_NTZ(9)
                nullable: false
            change_tracking: true
            manage_mode: none
            comment: table comment
            grants:
              USAGE:
                - ROLE_1
                - SYSADMIN:
                    with_grant_option: true
              MODIFY:
                - ROLE_1
                - ROLE_2
          PRODUCT:
            columns:
              PRODUCTID:
                type: NUMBER(38,0)
                nullable: false
              DESCRIPTION:
                type: VARCHAR(100)
                nullable: true
              UNITPRICE:
                type: NUMBER(24,2)
                nullable: false
              MODIFIEDDATE:
                type: TIMESTAMP_NTZ(9)
                nullable: false
            change_tracking: true
            manage_mode: none
            comment: table comment
            grants:
              USAGE:
                - ROLE_1
                - SYSADMIN:
                    with_grant_option: true
              MODIFY:
                - ROLE_1
                - ROLE_2
        is_managed: false
        is_transient: false
        manage_mode: none
        comment: schema comment
    namespacing: prefix
    manage_mode: none
    comment: database comment

Overwriting default database

The SOLE Generator retrieves the database name for each object from its fully qualified name if such name is provided in the DDL query. If a fully qualified name is absent in the DDL query, the SOLE Generator defaults to setting the database name to {{ env.DATAOPS_DATABASE }}.

To overwrite the default database name, use the --database or -db option of the dataops sole gen command:

dataops sole gen "create schema SALES comment='This is a test DataOps.live schema.';" -db MY_DATABASE

The YAML configuration generated by the SOLE Generator for the example above would be:

databases:
  MY_DATABASE:
    schemas:
      SALES:
        comment: This is a test DataOps.live schema.

Output

By default, the SOLE Generator outputs the YAML configuration to the standard output (in the Terminal window). To write the configuration into a YAML file, you can use the -o option of the dataops sole gen command:

Default Configuration
Data Products Configuration

dataops sole gen /path/to/input/file/database.sql -o /path/to/output/file/database.yml

dataops sole-dp gen /path/to/input/file/database.sql -o /path/to/output/file/database.yml

If the output file already exists, you can overwrite it by adding the force flag:

Default Configuration
Data Products Configuration

dataops sole gen /path/to/input/file/database.sql -o /path/to/output/file/database.yml -f

dataops sole-dp gen /path/to/input/file/database.sql -o /path/to/output/file/database.yml -f

Prerequisites​

Usage​

Generating SOLE configuration from a raw DDL statement​

Generating SOLE configuration from DDL files​

Generate configuration from a single DDL file​

Generate configuration from a directory​

Using templates to define default parameter values​

Create a template​

Use a template​

Use wildcards in templates​

Overwriting default database​

Output​