Skip to main content

DataOps Runner for Snowpark Container Services Additional Configuration

Secrets

Populating the DataOps vault

tip

The vault concepts section describes the DataOps vault in more detail. We recommend you read that page before proceeding any further.

By default, the runner will start with an empty {} vault.yml and a random salt.

Optionally, you can add an initial vault.yml and vault.salt, by adding files to the Orchestrators volume stage created during runner setup.

  1. Create your vault files:
echo {} | sudo tee vault.yml > /dev/null
echo $RANDOM | md5sum | head -c 20 | sudo tee vault.salt > /dev/null
  1. Update your vault.yml as necessary.

  2. In Snowsight, click "Add Data", and select "Load files into a Stage".

Snowsight Add Data !!shadow!!

  1. Select your vault.yml and vault.salt files to upload and complete the form by selecting the database and schema the runner service belongs to, as well as the stage ORCHESTRATOR_VOLUMES. Finally, specify the directory path /secrets/ and click "Upload".

Use Snowflake Secrets in a DataOps Project

You can also optionally create Snowflake Secrets and then reference them in your DataOps.live project. The DataOps Runner for Snowpark Container Services will automatically inject these secrets into the jobs in your pipeline. Note: this currently only works for Snowflake secrets of type GENERIC_STRING.

Step 1 - Create Snowflake Secrets

For example:

USE ROLE DATAOPS_RUNNER_ROLE;
USE DATABASE DATAOPS_RUNNER_DB;
USE WAREHOUSE DATAOPS_RUNNER_WAREHOUSE;
USE SCHEMA DATA_SCHEMA;

CREATE OR REPLACE SECRET SNOWFLAKE_ACCOUNT TYPE=GENERIC_STRING SECRET_STRING='account_name'; -- changeme
CREATE OR REPLACE SECRET SNOWFLAKE_USERNAME TYPE=GENERIC_STRING SECRET_STRING='DATAOPS_SOLE_ADMIN';
CREATE OR REPLACE SECRET SNOWFLAKE_PASSWORD TYPE=GENERIC_STRING SECRET_STRING='******************'; -- changeme
CREATE OR REPLACE SECRET SNOWFLAKE_ROLE TYPE=GENERIC_STRING SECRET_STRING='DATAOPS_SOLE_ADMIN_ROLE';
caution

Make sure the USE SCHEMA line is correct, depending on your setup. If you are using the default setup, the schema is DATA_SCHEMA. If you are using a schema per runner setup, the schema will be the appropriate runner schema.

Step 2 - Reference the secrets in your DataOps Project

subject to change

The implementation of resolving secrets is subject to change.

In your DataOps.live project, you can look up Snowflake secrets and convert them to job variables as follows:

variables.yml
variables:
SECRETS_MANAGER: "none"
SNOWFLAKE_SECRET_LOOKUP_01: "do://snowflake/SNOWFLAKE_ACCOUNT"
SNOWFLAKE_SECRET_LOOKUP_02: "do://snowflake/SNOWFLAKE_USERNAME"
SNOWFLAKE_SECRET_LOOKUP_03: "do://snowflake/SNOWFLAKE_PASSWORD"
SNOWFLAKE_SECRET_LOOKUP_04: "do://snowflake/SNOWFLAKE_ROLE"
SNOWFLAKE_SECRET_LOOKUP_05: "do://snowflake/SNOWFLAKE_WAREHOUSE"

First, we skip using a secrets manager and resolve variables from the Snowflake secrets directly. Thus, we use SECRETS_MANAGER: "none".

Second, the names of variables you use to look up a given secret can be anything but the secret name. In the examples, we used SNOWFLAKE_SECRET_LOOKUP_NN. The lookup will search for variable values that start with the string do://snowflake/ and use the rest of the string as the Snowflake secret name to resolve. Further, the Snowflake secret name is then passed as a job variable of the same name to the jobs to be executed.

When executed, the example will create the variables:

resolved variables
SNOWFLAKE_ACCOUNT: value from snowflake secret
SNOWFLAKE_USERNAME: value from snowflake secret
SNOWFLAKE_PASSWORD: value from snowflake secret
SNOWFLAKE_ROLE: value from snowflake secret
SNOWFLAKE_WAREHOUSE: value from snowflake secret
the name of the lookup variable and the actual variable need to be different

The reason they can't be the same is that the variables in the variables.yml take precedence over the injected secrets.

Consider the following example:

variables:
SNOWFLAKE_SECRET_LOOKUP_03: "do://snowflake/SNOWFLAKE_PASSWORD"
# or
MY_REFERENCE_FOR_SOLE_PASSWORD: "do://snowflake/SNOWFLAKE_PASSWORD"

Both produce the same result: the env variable SNOWFLAKE_PASSWORD in the jobs will be the value of the snowflake secret named "SNOWFLAKE_PASSWORD".

Don't do this:

variables:
SNOWFLAKE_PASSWORD: "do://snowflake/SNOWFLAKE_PASSWORD"

Because the env variable SNOWFLAKE_PASSWORD in the jobs will retain the string value "do://snowflake/SNOWFLAKE_PASSWORD".

Step 3 - Reference the secrets in your DataOps Project

Finally, you may need to update your vault.template.yml file in your project to populate the DataOps Vault for use by some orchestrators, such as the SOLE and MATE orchestrator.

vault.template.yml
SNOWFLAKE:
ACCOUNT: "{{ env.SNOWFLAKE_ACCOUNT }}"
SOLE:
ACCOUNT: "{{ env.SNOWFLAKE_ACCOUNT }}"
USERNAME: "{{ env.SNOWFLAKE_USERNAME }}"
PASSWORD: "{{ env.SNOWFLAKE_PASSWORD }}"
ROLE: "{{ env.SNOWFLAKE_ROLE }}"
WAREHOUSE: "{{ env.SNOWFLAKE_WAREHOUSE }}"
TRANSFORM:
USERNAME: "{{ env.SNOWFLAKE_USERNAME }}"
PASSWORD: "{{ env.SNOWFLAKE_PASSWORD }}"
ROLE: "{{ env.SNOWFLAKE_ROLE }}"
WAREHOUSE: "{{ env.SNOWFLAKE_WAREHOUSE }}"

For a SOLE job to work, you either need to have the variables DATAOPS_SOLE_XXX set or have the keys SNOWFLAKE.SOLE.XXX set in the DataOps vault. Since our examples used the Snowflake secret SNOWFLAKE_XXX, we mapped them to the vault keys SNOWFLAKE.SOLE.XXX with the YAML file shown.

Similarly, for MATE jobs to work, we require SNOWFLAKE.TRANSFORM.XXX DataOps vault keys to be present. Again, we map them from the Snowflake secret SNOWFLAKE_XXX.

Using the vault protects your credentials at runtime. Credentials are stored encrypted during pipeline execution and masked in log files.

Once you mapped the variables to the vault hierarchy, you can use them in any job definition using the standard DATAOPS_VAULT() function. For example:

my job
my job:
variables:
ACCOUNT: DATAOPS_VAULT(SNOWFLAKE.ACCOUNT)

Multiple Runners

In the Create database-level Snowflake objects step of the installation instructions, objects are created under one schema.

However, for a multiple-runner setup, we recommend using different schemas - one per runner - isolating them from each other. For example:

schema per runner setup
USE ROLE DATAOPS_RUNNER_ROLE;
USE DATABASE DATAOPS_RUNNER_DB;
USE WAREHOUSE DATAOPS_RUNNER_WAREHOUSE;

CREATE SCHEMA IF NOT EXISTS DATA_SCHEMA;
USE SCHEMA DATA_SCHEMA;
CREATE IMAGE REPOSITORY IF NOT EXISTS RUNNER_REPOSITORY;

CREATE SCHEMA IF NOT EXISTS RUNNER_A;
USE SCHEMA RUNNER_A;
CREATE STAGE IF NOT EXISTS EXEC_VOLUMES ENCRYPTION=(TYPE='SNOWFLAKE_SSE');
CREATE STAGE IF NOT EXISTS ORCHESTRATOR_VOLUMES ENCRYPTION=(TYPE='SNOWFLAKE_SSE');

CREATE SCHEMA IF NOT EXISTS RUNNER_B;
USE SCHEMA RUNNER_B;
CREATE STAGE IF NOT EXISTS EXEC_VOLUMES ENCRYPTION=(TYPE='SNOWFLAKE_SSE');
CREATE STAGE IF NOT EXISTS ORCHESTRATOR_VOLUMES ENCRYPTION=(TYPE='SNOWFLAKE_SSE');

Notes:

  • Creates a schema named DATA_SCHEMA to store the image repository. The image repository is shared between the runners.
  • Creates a schema named RUNNER_A to store the runner objects for the first runner.
  • Creates a schema named RUNNER_B to store the runner objects for the second runner.
  • Creates a stage named EXEC_VOLUMES and ORCHESTRATOR_VOLUMES in each runner schema.
  • Each runner service can be created within the appropriate schema.
  • Additionally, populating the vault and creating secrets can also done within the appropriate runner schema.

Restrictive Network Rule

In the Create an Access Integration step of the installation instructions, you create a network rule to allow the runner to connect to the Snowflake and DataOps platforms.

The Network Rule used in the setup instructions allows all outbound HTTP and HTTPS connections. However, if you want to restrict the outbound connections to specific domains, you can create a more restrictive network rule. For example:

CREATE NETWORK RULE IF NOT EXISTS ALLOW_RESTRICTED_RULE
TYPE = 'HOST_PORT'
MODE= 'EGRESS'
VALUE_LIST = ('app.dataops.live', 'myaccount.snowflakecomputing.com', 'dataops-artifacts-production.s3.eu-west-2.amazonaws.com');

USE ROLE ACCOUNTADMIN;
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION DATAOPS_ALLOW_RESTRICTED_INTEGRATION
ALLOWED_NETWORK_RULES = (ALLOW_RESTRICTED_RULE)
ENABLED = true;
GRANT OWNERSHIP ON ACCESS INTEGRATION DATAOPS_ALLOW_RESTRICTED_INTEGRATION TO ROLE DATAOPS_RUNNER_ROLE
COPY CURRENT GRANTS;
Account Name

Don't forget to change myaccount in the snowflakecomputing.com domain to your account.

Notes:

  • Creates a network rule named ALLOW_RESTRICTED_RULE.
    • The VALUE_LIST contains the minimum required domains for the runner to access the DataOps.live platform and your Snowflake account. You may need to update the VALUE_LIST depending on what external services your pipeline jobs call out to.
  • Creates an external access integration named DATAOPS_ALLOW_RESTRICTED_INTEGRATION.