DataOps Runner for Snowpark Container Services Additional Configuration
Secrets
Populating the DataOps vault
The vault concepts section describes the DataOps vault in more detail. We recommend you read that page before proceeding any further.
By default, the runner will start with an empty {}
vault.yml
and a random salt.
Optionally, you can add an initial vault.yml
and vault.salt
, by adding files to the Orchestrators volume stage
created during runner setup.
- Create your vault files:
echo {} | sudo tee vault.yml > /dev/null
echo $RANDOM | md5sum | head -c 20 | sudo tee vault.salt > /dev/null
-
Update your
vault.yml
as necessary. -
In Snowsight, click "Add Data", and select "Load files into a Stage".
- Select your
vault.yml
andvault.salt
files to upload and complete the form by selecting the database and schema the runner service belongs to, as well as the stageORCHESTRATOR_VOLUMES
. Finally, specify the directory path/secrets/
and click "Upload".
Use Snowflake Secrets in a DataOps Project
You can also optionally create Snowflake Secrets and then
reference them in your DataOps.live project. The DataOps Runner for Snowpark Container Services will automatically inject these secrets into the
jobs in your pipeline. Note: this currently only works for Snowflake secrets of type GENERIC_STRING
.
Step 1 - Create Snowflake Secrets
For example:
USE ROLE DATAOPS_RUNNER_ROLE;
USE DATABASE DATAOPS_RUNNER_DB;
USE WAREHOUSE DATAOPS_RUNNER_WAREHOUSE;
USE SCHEMA DATA_SCHEMA;
CREATE OR REPLACE SECRET SNOWFLAKE_ACCOUNT TYPE=GENERIC_STRING SECRET_STRING='account_name'; -- changeme
CREATE OR REPLACE SECRET SNOWFLAKE_USERNAME TYPE=GENERIC_STRING SECRET_STRING='DATAOPS_SOLE_ADMIN';
CREATE OR REPLACE SECRET SNOWFLAKE_PASSWORD TYPE=GENERIC_STRING SECRET_STRING='******************'; -- changeme
CREATE OR REPLACE SECRET SNOWFLAKE_ROLE TYPE=GENERIC_STRING SECRET_STRING='DATAOPS_SOLE_ADMIN_ROLE';
Make sure the USE SCHEMA
line is correct, depending on your setup. If you are using the default setup, the
schema is DATA_SCHEMA
. If you are using a schema per runner setup, the schema will be the appropriate runner schema.
Step 2 - Reference the secrets in your DataOps Project
The implementation of resolving secrets is subject to change.
In your DataOps.live project, you can look up Snowflake secrets and convert them to job variables as follows:
variables:
SECRETS_MANAGER: "none"
SNOWFLAKE_SECRET_LOOKUP_01: "do://snowflake/SNOWFLAKE_ACCOUNT"
SNOWFLAKE_SECRET_LOOKUP_02: "do://snowflake/SNOWFLAKE_USERNAME"
SNOWFLAKE_SECRET_LOOKUP_03: "do://snowflake/SNOWFLAKE_PASSWORD"
SNOWFLAKE_SECRET_LOOKUP_04: "do://snowflake/SNOWFLAKE_ROLE"
SNOWFLAKE_SECRET_LOOKUP_05: "do://snowflake/SNOWFLAKE_WAREHOUSE"
First, we skip using a secrets manager and resolve variables from the Snowflake secrets directly. Thus, we use SECRETS_MANAGER: "none"
.
Second, the names of variables you use to look up a given secret can be anything but the secret name. In the examples, we used SNOWFLAKE_SECRET_LOOKUP_NN
.
The lookup will search for variable values that start with the string do://snowflake/
and use the rest of the string as the Snowflake secret name to resolve.
Further, the Snowflake secret name is then passed as a job variable of the same name to the jobs to be executed.
When executed, the example will create the variables:
SNOWFLAKE_ACCOUNT: value from snowflake secret
SNOWFLAKE_USERNAME: value from snowflake secret
SNOWFLAKE_PASSWORD: value from snowflake secret
SNOWFLAKE_ROLE: value from snowflake secret
SNOWFLAKE_WAREHOUSE: value from snowflake secret
The reason they can't be the same is that the variables in the variables.yml take precedence over the injected secrets.
Consider the following example:
variables:
SNOWFLAKE_SECRET_LOOKUP_03: "do://snowflake/SNOWFLAKE_PASSWORD"
# or
MY_REFERENCE_FOR_SOLE_PASSWORD: "do://snowflake/SNOWFLAKE_PASSWORD"
Both produce the same result: the env variable SNOWFLAKE_PASSWORD
in the jobs will be the value of the snowflake secret named "SNOWFLAKE_PASSWORD".
Don't do this:
variables:
SNOWFLAKE_PASSWORD: "do://snowflake/SNOWFLAKE_PASSWORD"
Because the env variable SNOWFLAKE_PASSWORD
in the jobs will retain the string value "do://snowflake/SNOWFLAKE_PASSWORD".
Step 3 - Reference the secrets in your DataOps Project
Finally, you may need to update your vault.template.yml
file in your project to populate the
DataOps Vault for use by some orchestrators, such as the SOLE and MATE orchestrator.
SNOWFLAKE:
ACCOUNT: "{{ env.SNOWFLAKE_ACCOUNT }}"
SOLE:
ACCOUNT: "{{ env.SNOWFLAKE_ACCOUNT }}"
USERNAME: "{{ env.SNOWFLAKE_USERNAME }}"
PASSWORD: "{{ env.SNOWFLAKE_PASSWORD }}"
ROLE: "{{ env.SNOWFLAKE_ROLE }}"
WAREHOUSE: "{{ env.SNOWFLAKE_WAREHOUSE }}"
TRANSFORM:
USERNAME: "{{ env.SNOWFLAKE_USERNAME }}"
PASSWORD: "{{ env.SNOWFLAKE_PASSWORD }}"
ROLE: "{{ env.SNOWFLAKE_ROLE }}"
WAREHOUSE: "{{ env.SNOWFLAKE_WAREHOUSE }}"
For a SOLE job to work, you either need to have the variables DATAOPS_SOLE_XXX
set or have the keys SNOWFLAKE.SOLE.XXX
set in the DataOps vault.
Since our examples used the Snowflake secret SNOWFLAKE_XXX,
we mapped them to the vault keys SNOWFLAKE.SOLE.XXX
with the YAML file shown.
Similarly, for MATE jobs to work, we require SNOWFLAKE.TRANSFORM.XXX
DataOps vault keys to be present. Again, we map them from the
Snowflake secret SNOWFLAKE_XXX
.
Using the vault protects your credentials at runtime. Credentials are stored encrypted during pipeline execution and masked in log files.
Once you mapped the variables to the vault hierarchy, you can use them in any job definition using the standard DATAOPS_VAULT()
function. For example:
my job:
variables:
ACCOUNT: DATAOPS_VAULT(SNOWFLAKE.ACCOUNT)
Multiple Runners
In the Create database-level Snowflake objects step of the installation instructions, objects are created under one schema.
However, for a multiple-runner setup, we recommend using different schemas - one per runner - isolating them from each other. For example:
USE ROLE DATAOPS_RUNNER_ROLE;
USE DATABASE DATAOPS_RUNNER_DB;
USE WAREHOUSE DATAOPS_RUNNER_WAREHOUSE;
CREATE SCHEMA IF NOT EXISTS DATA_SCHEMA;
USE SCHEMA DATA_SCHEMA;
CREATE IMAGE REPOSITORY IF NOT EXISTS RUNNER_REPOSITORY;
CREATE SCHEMA IF NOT EXISTS RUNNER_A;
USE SCHEMA RUNNER_A;
CREATE STAGE IF NOT EXISTS EXEC_VOLUMES ENCRYPTION=(TYPE='SNOWFLAKE_SSE');
CREATE STAGE IF NOT EXISTS ORCHESTRATOR_VOLUMES ENCRYPTION=(TYPE='SNOWFLAKE_SSE');
CREATE SCHEMA IF NOT EXISTS RUNNER_B;
USE SCHEMA RUNNER_B;
CREATE STAGE IF NOT EXISTS EXEC_VOLUMES ENCRYPTION=(TYPE='SNOWFLAKE_SSE');
CREATE STAGE IF NOT EXISTS ORCHESTRATOR_VOLUMES ENCRYPTION=(TYPE='SNOWFLAKE_SSE');
Notes:
- Creates a schema named
DATA_SCHEMA
to store the image repository. The image repository is shared between the runners. - Creates a schema named
RUNNER_A
to store the runner objects for the first runner. - Creates a schema named
RUNNER_B
to store the runner objects for the second runner. - Creates a stage named
EXEC_VOLUMES
andORCHESTRATOR_VOLUMES
in each runner schema. - Each runner service can be created within the appropriate schema.
- Additionally, populating the vault and creating secrets can also done within the appropriate runner schema.
Restrictive Network Rule
In the Create an Access Integration step of the installation instructions, you create a network rule to allow the runner to connect to the Snowflake and DataOps platforms.
The Network Rule used in the setup instructions allows all outbound HTTP and HTTPS connections. However, if you want to restrict the outbound connections to specific domains, you can create a more restrictive network rule. For example:
CREATE NETWORK RULE IF NOT EXISTS ALLOW_RESTRICTED_RULE
TYPE = 'HOST_PORT'
MODE= 'EGRESS'
VALUE_LIST = ('app.dataops.live', 'myaccount.snowflakecomputing.com', 'dataops-artifacts-production.s3.eu-west-2.amazonaws.com');
USE ROLE ACCOUNTADMIN;
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION DATAOPS_ALLOW_RESTRICTED_INTEGRATION
ALLOWED_NETWORK_RULES = (ALLOW_RESTRICTED_RULE)
ENABLED = true;
GRANT OWNERSHIP ON ACCESS INTEGRATION DATAOPS_ALLOW_RESTRICTED_INTEGRATION TO ROLE DATAOPS_RUNNER_ROLE
COPY CURRENT GRANTS;
Don't forget to change myaccount
in the snowflakecomputing.com
domain to your account.
Notes:
- Creates a network rule named
ALLOW_RESTRICTED_RULE
.- The
VALUE_LIST
contains the minimum required domains for the runner to access the DataOps.live platform and your Snowflake account. You may need to update theVALUE_LIST
depending on what external services your pipeline jobs call out to.
- The
- Creates an external access integration named
DATAOPS_ALLOW_RESTRICTED_INTEGRATION
.