Advanced Configuration
All DataOps Orchestrators including SOLE support Jinja templating to render YAML files.
Vault Credentials usage is another use-case of Jinja in DataOps.
Jinja can be used to compute and generate values at.
Jinja documentation can be referred for advanced configuration and use-cases as per one's need.
Below are few examples of Jinja template engine to generate definitions at run-time.
Refer to Jinja documentation for list of available filters, tests, etc.
Branch Specific Configuration
Jinja Allows user to add configuration which is only enabled in specific environment.
In DataOps there are 4 types of environment:
- PROD:
This environment is created by pipelines run onmaster
branch. - QA:
This environment is created by pipelines run onqa
branch. - DEV:
This environment is created by pipelines run ondev
branch. - FB:
- This environment is created by pipelines run on any branch other than
master
,qa
ordev
. - The feature environment names are in format FB_BRANCH_NAME.
- The branch name is in upper-case.
- This environment is created by pipelines run on any branch other than
The following examples illustrate the usage of some branch-specific clauses, but the conditions can be utilized as per requirement.
PROD Only
In the following example the namespacing
parameter is set to prefix if the environment is PROD else both is used.
roles:
INGESTION_ROLE:
{% if env.DATAOPS_ENV_NAME == 'PROD' %}
namespacing: prefix
{% else %}
namespacing: both
{% endif %}
QA Only
In the following example role INGESTION_ROLE is granted USAGE
on database TEST_DATABASE only when environment is QA.
databases:
TEST_DATABASE:
grants:
USAGE:
- ACCOUNTADMIN
- TRANSFORMATION_ROLE
{% if env.DATAOPS_ENV_NAME == 'QA' %}
- INGESTION_ROLE
{% endif %}
Dev Only
In the following example manage_mode
is set to grants
on table TEST_TABLE only when environment is DEV.
databases:
TEST_DATABASE:
schemas:
TEST_SCHEMA:
tables:
TEST_TABLE:
{% if env.DATAOPS_ENV_NAME == "DEV" %}
manage_mode: grants
{% endif %}
grants:
SELECT:
- INGESTION_ROLE
columns:
TEST_COLUMN:
type: NUMBER(38,0)
FB Only
In the following example role comment
is added to schema TEST_SCHEMA only when environment is FB.
databases:
TEST_DATABASE:
schemas:
TEST_SCHEMA:
{% if env.DATAOPS_ENV_NAME.startswith('FB') %}
comment: "TEST SCHEMA"
{% endif %}
PROD and QA Only
In the following example, table TEST_TABLE_1 is only created when environment is PROD or QA.
databases:
TEST_DATABASE:
schemas:
TEST_SCHEMA:
tables:
{% if env.DATAOPS_ENV_NAME == "PROD" or env.DATAOPS_ENV_NAME == "QA" %}
TEST_TABLE_1:
columns:
COLUMN_1:
type: NUMBER(38, 0)
{% endif %}
TEST_TABLE_2:
columns:
COLUMN_1:
type: NUMBER(38, 0)
Every Env except QA
In the following example, column COLUMN_2 is only created when environment is not QA.
databases:
TEST_DATABASE:
schemas:
TEST_SCHEMA:
tables:
TEST_TABLE_1:
columns:
COLUMN_1:
type: NUMBER(38, 0)
{% if env.DATAOPS_ENV_NAME != "QA" %}
COLUMN_2:
type: NUMBER(38, 0)
{% endif %}
Every Env except FB
In the following example, column COLUMN_2 is only created when environment is not FB.
databases:
TEST_DATABASE:
schemas:
TEST_SCHEMA:
tables:
TEST_TABLE_1:
columns:
COLUMN_1:
type: NUMBER(38, 0)
{% if not env.DATAOPS_ENV_NAME.startswith('FB') %}
COLUMN_2:
type: NUMBER(38, 0)
{% endif %}
Environment Specific value
In the following example, the value of namespacing depends on the environment:
- If environment is PROD, then
namespacing
is set to none. - If environment is QA, then
namespacing
is set to prefix. - If environment is DEV, then
namespacing
is set to suffix. - For all other environments,
namespacing
is set to both.roles:
INGESTION_ROLE:
{% if env.DATAOPS_ENV_NAME == 'PROD' %}
namespacing: none
{% elif env.DATAOPS_ENV_NAME == 'QA' %}
namespacing: prefix
{% elif env.DATAOPS_ENV_NAME == 'DEV' %}
namespacing: suffix
{% else %}
namespacing: both
{% endif %}
Using Configuration to Generate Values
info
This functionality is only usable in SOLE
Variables can be defined in YAML files as configuration. These variables can be used at run-time to generate definitions for SOLE.
The directory path in which the such files are defined must be set in variable DATAOPS_CONFIGURATION_DIR
.
info
YAML files in this directory do not support Jinja. These files are used variable declaration for Jinja
Object Parameter Calculation
Defined variables can be used to generate value for a parameter of an Object.
In the following example we have the values.yaml
defined to be used as variables
warehouse_sizes:
- "X-SMALL"
- "SMALL"
- "MEDIUM"
- "LARGE"
- "X-LARGE"
- "2X-LARGE"
- "3X-LARGE"
- "4X-LARGE"
cluster_count:
"X-SMALL":
max_cluster_count: 1
min_cluster_count: 1
"SMALL":
max_cluster_count: 2
min_cluster_count: 1
"MEDIUM":
max_cluster_count: 3
min_cluster_count: 1
"LARGE":
max_cluster_count: 8
min_cluster_count: 1
"X-LARGE":
max_cluster_count: 10
min_cluster_count: 1
"2X-LARGE":
max_cluster_count: 10
min_cluster_count: 2
"3X-LARGE":
max_cluster_count: 10
min_cluster_count: 3
"4X-LARGE":
max_cluster_count: 10
min_cluster_count: 4
The variables defined in the YAML file would be used as input when rendering object definitions with Jinja.
Following is file with resource monitor definition which uses the input variables
{% if env.MAX_CLUSTER_COUNT is defined and env.MAX_CLUSTER_COUNT|int != 0 and env.MAX_CLUSTER_COUNT|int >= 1 and env.MAX_CLUSTER_COUNT|int <= 10 %}
{% set max_cluster_count = env.MAX_CLUSTER_COUNT %}
{% else %}
{% if env.WAREHOUSE_SIZE is defined and env.WAREHOUSE_SIZE in warehouse_sizes %}
{% set max_cluster_count = cluster_count[env.WAREHOUSE_SIZE]['max_cluster_count'] %}
{% else %}
{% set max_cluster_count = 1 %}
{% endif %}
{% endif %}
{% if env.MIN_CLUSTER_COUNT is defined and env.MIN_CLUSTER_COUNT|int != 0 and env.MIN_CLUSTER_COUNT|int >= 1 and env.MIN_CLUSTER_COUNT|int <= 10 %}
{% set min_cluster_count = env.MIN_CLUSTER_COUNT and env.MIN_CLUSTER_COUNT|int != 0 %}
{% else %}
{% if env.WAREHOUSE_SIZE is defined and env.WAREHOUSE_SIZE in warehouse_sizes %}
{% set min_cluster_count = cluster_count[env.WAREHOUSE_SIZE]['min_cluster_count'] %}
{% else %}
{% set min_cluster_count = 1 %}
{% endif %}
{% endif %}
warehouses:
WAREHOUSE_1:
max_cluster_count: {{ max_cluster_count }}
min_cluster_count: {{ min_cluster_count }}
Depending on the value of MAX_CLUSTER_COUNT
, MIN_CLUSTER_COUNT
, WAREHOUSE_SIZE
(which are defined as environment variable), the value of max_cluster_count
and min_cluster_count
are calculated.
The calculated values are then utilized as parameters max_cluster_count
and min_cluster_count
respectively in a warehouse.
The value of max_cluster_count
(and similarly min_cluster_count
) would be calculated as:
- If
MAX_CLUSTER_COUNT
is set, and it is a valid integer between 1 and 10(inclusive), then that value is set asmax_cluster_count
. - If
MAX_CLUSTER_COUNT
is invalid or missing, and valid value ofWAREHOUSE_SIZE
is defined, then value is set to integer set in mapcluster_count
. - If
MAX_CLUSTER_COUNT
andWAREHOUSE_SIZE
are invalid or missing, then value is set to 1.
If both MAX_CLUSTER_COUNT
and MIN_CLUSTER_COUNT
are not set and WAREHOUSE_SIZE
is set to 2X-LARGE, then value of max_cluster_count
and min_cluster_count
would be 10 and 2 respectively.
Configurations can be set as per requirements.
lifecycle
keyword
All SOLE supported objects accepts lifecycle
parameter except View. General syntax:
lifecycle:
ignore_changes:
- <parameter-name>
- <parameter-name>
prevent_destroy: boolean
The lifecycle
keyword accepts two parameters:
ignore_changes
: This accepts a list/array of object parameter names that is intended to be used only at the time of creation of the object but after creation these parameters may change and should be ignored.
Example
databases:
DATABASE_1:
comment: This is DATABASE_1
lifecycle:
ignore_changes:
- comment
prevent_destroy
: This accepts boolean value, setting this parameter totrue
for an object will instruct SOLE not to delete that object.caution
Destroy would only be prevented if config is present. If config removed, object would be destroyed. Checkout terraform doc.
Example
databases:
DATABASE_1:
lifecycle:
prevent_destroy: true
Hooks
info
This feature is only available when LIFECYCLE_ACTION
is set to AGGREGATE
SOLE supports usage of Pre- and Post-Hooks.
Hooks would allow adding actions to be performed before and after lifecycle management of the defined objects.
The pre- and post-hooks would be executed for each run of SOLE.
SOLE has 4 different resource groups:
- Account-Level
- Databases
- Database-Level
- Grants
Account-Level and Databases belong to the same Resource group as per Snowflake. In SOLE they are separated so that databases can be executed separately.
For hooks, they can be treated as a single resource group
Resource Group Hooks
Each Resource group would have its pre- and post-hooks.
This would allow users to define setup and teardown actions to for each resource-groups for more control over lifecycle of objects.
The execution of SOLE and resource groups would be similar to illustrated below:
caution
Hooks do not support Object name resolution
Example
Hooks can be defined in any existing or new YAML file that would be collected by SOLE(Directory specified in the environment variable CONFIGURATION_DIR
).
account_level_hooks:
pre_hooks:
- command: "ALTER ACCOUNT SET RESOURCE_MONITOR = USAGE_MONITOR;"
environment: snowflake
post_hooks:
- command: "$CI_PROJECT_DIR/dataops/sole_hooks/account_level_post_hook.sql"
environment: snowflake
- command: whoami
environment: bash
database_level_hooks:
post_hooks:
- command: "CALL add_table_to_share('SHAREFOO','DATAOPS_DEMO5_DEV','AW_PERSON','ADDRESS')"
environment: snowflake
grants_hooks:
pre_hooks:
- command: "$CI_PROJECT_DIR/dataops/sole_hooks/grant_pre_hook.sql"
environment: snowflake
post_hooks:
- command: "$CI_PROJECT_DIR/dataops/sole_hooks/grant_post_hook.sql"
environment: snowflake