Skip to main content

DataOps Runner for Snowpark Container Services Installation

Feature release status badge: PubPrev
PubPrev

Snowflake Account Setup

Before you can install the runner, you need to create some objects in Snowflake to enable the DataOps runner to operate within Snowpark Container Services.

spcs-runner-role-hierarchy !!shadow!!

Step 1 - Create the account-level Snowflake objects

Execute the following script. The object names in the script are configurable and can be changed to suit your naming conventions.

Snowflake objects at account level
USE ROLE SYSADMIN;
CREATE ROLE IF NOT EXISTS DATAOPS_RUNNER_ADMIN_ROLE;
GRANT ROLE DATAOPS_RUNNER_ADMIN_ROLE TO ROLE SYSADMIN;

CREATE DATABASE IF NOT EXISTS DATAOPS_RUNNER_DB;
GRANT OWNERSHIP ON DATABASE DATAOPS_RUNNER_DB TO ROLE DATAOPS_RUNNER_ADMIN_ROLE COPY CURRENT GRANTS;
GRANT OWNERSHIP ON SCHEMA DATAOPS_RUNNER_DB.PUBLIC TO ROLE DATAOPS_RUNNER_ADMIN_ROLE COPY CURRENT GRANTS;

CREATE OR REPLACE WAREHOUSE DATAOPS_RUNNER_WAREHOUSE
WITH WAREHOUSE_SIZE='X-SMALL';
GRANT OWNERSHIP ON WAREHOUSE DATAOPS_RUNNER_WAREHOUSE TO ROLE DATAOPS_RUNNER_ADMIN_ROLE COPY CURRENT GRANTS;

USE ROLE ACCOUNTADMIN;
CREATE COMPUTE POOL IF NOT EXISTS DATAOPS_RUNNER_COMPUTE_POOL
MIN_NODES = 1
MAX_NODES = 1
INSTANCE_FAMILY = CPU_X64_S;
GRANT USAGE, MONITOR ON COMPUTE POOL DATAOPS_RUNNER_COMPUTE_POOL TO ROLE DATAOPS_RUNNER_ADMIN_ROLE;
GRANT OWNERSHIP ON COMPUTE POOL DATAOPS_RUNNER_COMPUTE_POOL TO ROLE DATAOPS_RUNNER_ADMIN_ROLE
COPY CURRENT GRANTS;

Notes:

  • Creates a role DATAOPS_RUNNER_ADMIN_ROLE to own the objects that have usage shared between one or more runners.
  • Creates a dedicated database for all the DataOps runners named DATAOPS_RUNNER_DB
  • Creates a warehouse named DATAOPS_RUNNER_WAREHOUSE for the runner services to use.
  • Creates a single compute pool name DATAOPS_RUNNER_COMPUTE_POOL for all runner services to run on.
    • The instance type CPU_X64_S is our current recommended size. See working with compute pools for more detail.
    • The same compute pool will be reused for all on-demand orchestrators runs during pipeline executing
    • You may want to expand this configuration so that each runner has its own compute pool.
optimizing Snowflake credit spent

Contact us if you would like to discuss a compute pool setup with two compute pools - one for the long-running runner and one for the on-demand orchestrators.

Step 2 - Create the service user for image syncing

Service user for image syncing
USE ROLE SYSADMIN;
CREATE ROLE IF NOT EXISTS DATAOPS_RUNNER_IMAGES_ROLE;
GRANT ROLE DATAOPS_RUNNER_IMAGES_ROLE TO ROLE DATAOPS_RUNNER_ADMIN_ROLE;

USE ROLE USERADMIN;
CREATE USER IF NOT EXISTS DATAOPS_RUNNER_USER
TYPE = 'LEGACY_SERVICE'
PASSWORD = 'change-me-123'
DEFAULT_ROLE = DATAOPS_RUNNER_IMAGES_ROLE;

USE ROLE SYSADMIN;
GRANT ROLE DATAOPS_RUNNER_IMAGES_ROLE TO USER DATAOPS_RUNNER_USER;

Notes:

  • Creates a role DATAOPS_RUNNER_IMAGES_ROLE, which will own the image repository.
  • Creates a user named DATAOPS_RUNNER_USER, which is used by the runner service to synchronize orchestrator container images. This user is not used for anything else.
  • The user is granted the DATAOPS_RUNNER_IMAGES_ROLE role.

Step 3 - Create the image repository

Create the image repository which the DATAOPS_RUNNER_IMAGES_ROLE role will own. The objects created in this step are used between all runners.

Image repository
USE ROLE DATAOPS_RUNNER_ADMIN_ROLE;
USE DATABASE DATAOPS_RUNNER_DB;
USE WAREHOUSE DATAOPS_RUNNER_WAREHOUSE;

CREATE SCHEMA IF NOT EXISTS IMAGES;
GRANT OWNERSHIP ON SCHEMA IMAGES TO ROLE DATAOPS_RUNNER_IMAGES_ROLE
COPY CURRENT GRANTS;

GRANT USAGE ON WAREHOUSE DATAOPS_RUNNER_WAREHOUSE TO ROLE DATAOPS_RUNNER_IMAGES_ROLE;
GRANT USAGE ON DATABASE DATAOPS_RUNNER_DB TO ROLE DATAOPS_RUNNER_IMAGES_ROLE;

USE ROLE DATAOPS_RUNNER_IMAGES_ROLE;
USE SCHEMA IMAGES;
CREATE IMAGE REPOSITORY IF NOT EXISTS RUNNER_REPOSITORY;
CREATE OR REPLACE SECRET DATAOPS_RUNNER_USER_SECRET TYPE = password
USERNAME = 'DATAOPS_RUNNER_USER'
PASSWORD = 'change-me-123';

Notes:

  • Creates a schema named IMAGES to hold shared objects - an image repository, and the Runner User secret.
    • Note: Contact us if you would prefer each runner to have its own image repository and we can advise on the best setup.
  • Creates a Snowflake image repository named RUNNER_REPOSITORY to store the DataOps runner and DataOps orchestrators container images.
    • as the repository is a schema object, we don't use the DATAOPS_ prefix
  • Creates a Snowflake secret named DATAOPS_RUNNER_USER_SECRET. These are the credentials the runner user uses to access the image repository.

Step 4 - Create an access integration

The DataOps Jobs within Snowpark Container Services use the default host to connect to Snowflake. This requires external access integration with a network rule allowing access from your service to any external services (including the DataOps.live platform).

External access integration
USE ROLE DATAOPS_RUNNER_ADMIN_ROLE;
USE DATABASE DATAOPS_RUNNER_DB;
USE SCHEMA PUBLIC;

CREATE NETWORK RULE IF NOT EXISTS ALLOW_ALL_RULE
TYPE = 'HOST_PORT'
MODE= 'EGRESS'
VALUE_LIST = ('0.0.0.0:443','0.0.0.0:80');

USE ROLE SYSADMIN;
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION DATAOPS_ALLOW_ALL_INTEGRATION
ALLOWED_NETWORK_RULES = (PUBLIC.ALLOW_ALL_RULE)
ENABLED = true;

GRANT OWNERSHIP ON INTEGRATION DATAOPS_ALLOW_ALL_INTEGRATION TO ROLE DATAOPS_RUNNER_ADMIN_ROLE
COPY CURRENT GRANTS;

Notes:

  • Creates a network rule named ALLOW_ALL_RULE.
    • The VALUE_LIST in the network rule is set to allow all outbound traffic. You can restrict this to specific IP addresses or ranges, or domains if required.
    • See Restrictive Network Rule for the minimum required domains.
  • Creates an external access integration named DATAOPS_ALLOW_ALL_INTEGRATION.

Push the DataOps Runner image for Snowpark Container Services to your Snowflake account

You will need to upload the DataOps Runner image for Snowpark Container Services to your Snowflake account's image repository, to make it available for starting a runner service. These steps are required for the first-time setup and whenever you need to upgrade the DataOps Runner for Snowpark Container Services.

Step 1 - Pull the DataOps Runner image for Snowpark Container Services

On a machine with Docker installed, run the following commands to pull the DataOps Runner image from the DataOps.live Docker registry.

docker login --username dataopsread --password dckr_pat_82FQ4O6N4yb6fXJc15kIvX4Qrtg
docker pull dataopslive/dataops-spcs-runner:latest

Step 2 - Push the image to your image repository

Now tag and push the image to your Snowflake image repository.

First set the environment variables for your Snowflake account and the image repository path.

# The name of your Snowflake account
export SNOWFLAKE_ACCOUNT_NAME="<orgname>-<account-name>"
# The location of the image repository created during the Setup steps
# In the format "<database>/<schema>/<image-repository>"
export IMAGE_REPOSITORY_PATH="dataops_runner_db/images/runner_repository" # Must be lowercase
export IMAGE_REPOSITORY=$SNOWFLAKE_ACCOUNT.registry.snowflakecomputing.com/$IMAGE_REPOSITORY_PATH
note

If you changed the database or schema name in previous steps, make sure to reflect them here and convert the names to lowercase.

Login to the Snowflake image registry

You can authenticate to the Snowflake image registry using a username and password or other methods.

Username password login
export SNOWFLAKE_USER="DATAOPS_RUNNER_USER" # The user created during the Setup steps
export SNOWFLAKE_PASSWORD="change-me-123" # The password set during the Setup steps

Note that you must use the same password as you did in the previous steps.

docker login $IMAGE_REPOSITORY -u $SNOWFLAKE_USER -p $SNOWFLAKE_PASSWORD
Other authentication methods

For the other ways of authenticating to the Snowflake image registry, see the Snowflake CLI documentation.

If you want to push the DataOps Runner image to your Snowflake account as your currently logged in user, make sure you have the right permissions to push the image to the image repository. For example:

GRANT ROLE DATAOPS_RUNNER_ADMIN_ROLE TO USER <your-logged-in-username>

Tag and push the image

docker tag dataopslive/dataops-spcs-runner:latest $IMAGE_REPOSITORY/dataops-spcs-runner:latest
docker push $IMAGE_REPOSITORY/dataops-spcs-runner:latest

Create a DataOps Runner for Snowpark Container Services

multiple runners

The following setup steps are identical whether you are installing a single runner or opting for a multiple-runner setup. Each runner uses a separate schema and role to isolate them from each other. Remember to use the correct registration token for each runner

Step 1 - Fetch a registration token from the platform

The registration token is generated automatically in DataOps.live and is used to link together the runner you are about to create with your specific DataOps project or group.

note

Runner registration tokens are scoped to either a top-level group, sub-group, or project.

Follow these steps to obtain your registration token:

  1. Connect to the data product platform.

  2. Open the group (preferred) or project you want to create the runner for.

  3. At the group level, follow the below steps:

    1. Click CI/CD → Runners. Choosing the group makes the runner available to all projects in that group.

    2. Expand Register a group runner on the top right and copy the registration token.

      Group runner token !!shadow!!

  4. At the project level, follow the below steps:

    1. Click Settings → CI/CD.
    2. Find the Runners section and click Expand.
    3. Copy the registration token from inside the Project runners section under Set up a project runner for a project.

    Project runner token !!shadow!!

Step 2 - Create Runner specific Roles, Schema, and Stages

info

In this section, we will use SQL variables when creating objects. This will make it easier to change their names. Updated the variables, replacing TEAM_A with a value appropriate for this runner. Depending on how you segregate your runners, you could use the name of the team that owns the runner or the name of the project the runner is for, for example, 'TEAM_NAME' or 'PROJECT_NAME'. Additionally, you may want to include an environment name in the runner name, for example, 'TEAM_NAME_DEV' or 'PROJECT_NAME_DEV.'

Update and run the following to set your variables.

-- Update all values before running this and change TEAM_A to your naming convention
SET runner_role_name='DATAOPS_RUNNER_TEAM_A_ROLE';
SET runner_schema_name='TEAM_A';
SET runner_service_name='DATAOPS_RUNNER_TEAM_A_SERVICE';
SET runner_registration_token='"<registration-token>"';
SET runner_agent_name='"DATAOPS_RUNNER_TEAM_A"';
SET runner_agent_tag='"DATAOPS_RUNNER_TEAM_A"';
SET account_name='"<orgname>-<account-name>"'; -- https://docs.snowflake.com/en/user-guide/admin-account-identifier

Notes:

  • runner_role_name - The name of the role that will be used by the runner service.
  • runner_schema_name - The name of the schema that will be used by the runner service.
  • runner_service_name - The name of the service that will be created for the runner.
  • runner_registration_token - The token you obtained from the DataOps.live platform.
  • runner_agent_name - The name to give to this runner.
  • runner_agent_tag - The tag(s) to give to this runner. It can be a comma separated list.
  • account_name - The name of your Snowflake account in <orgname>-<account-name> format.
caution

The values for runner_registration_token, runner_agent_name, runner_agent_tag, and account_name must be enclosed in double quotes " for them to be used in the service specification in step 3.

After setting the SQL variables, run the following SQL script to create the remaining Snowflake objects specific to the Runner.

Runner roles, schema, and stage volume
USE ROLE SYSADMIN;
CREATE ROLE IF NOT EXISTS IDENTIFIER($runner_role_name);
GRANT ROLE IDENTIFIER($runner_role_name) TO ROLE DATAOPS_RUNNER_ADMIN_ROLE;

USE ROLE DATAOPS_RUNNER_ADMIN_ROLE;
USE DATABASE DATAOPS_RUNNER_DB;
CREATE SCHEMA IF NOT EXISTS IDENTIFIER($runner_schema_name);
GRANT OWNERSHIP ON SCHEMA IDENTIFIER($runner_schema_name) TO ROLE IDENTIFIER($runner_role_name)
COPY CURRENT GRANTS;

USE ROLE DATAOPS_RUNNER_ADMIN_ROLE;
GRANT USAGE ON DATABASE DATAOPS_RUNNER_DB TO ROLE IDENTIFIER($runner_role_name);
GRANT USAGE ON SCHEMA DATAOPS_RUNNER_DB.IMAGES TO ROLE IDENTIFIER($runner_role_name);
GRANT READ ON IMAGE REPOSITORY DATAOPS_RUNNER_DB.IMAGES.RUNNER_REPOSITORY TO ROLE IDENTIFIER($runner_role_name);
GRANT READ ON SECRET DATAOPS_RUNNER_DB.IMAGES.DATAOPS_RUNNER_USER_SECRET TO ROLE IDENTIFIER($runner_role_name);
GRANT USAGE ON INTEGRATION DATAOPS_ALLOW_ALL_INTEGRATION TO ROLE IDENTIFIER($runner_role_name);
GRANT USAGE ON COMPUTE POOL DATAOPS_RUNNER_COMPUTE_POOL TO ROLE IDENTIFIER($runner_role_name);
GRANT USAGE ON WAREHOUSE DATAOPS_RUNNER_WAREHOUSE TO ROLE IDENTIFIER($runner_role_name);

USE ROLE IDENTIFIER($runner_role_name);
USE DATABASE DATAOPS_RUNNER_DB;
USE SCHEMA IDENTIFIER($runner_schema_name);
CREATE STAGE IF NOT EXISTS ORCHESTRATOR_VOLUMES ENCRYPTION=(TYPE='SNOWFLAKE_SSE');

Notes:

  • Creates a role for the runner.
  • Creates a schema to hold the runner specific objects.
  • Grants ownership of the schema to the runner role.
  • Grants the necessary permissions to the runner role.
  • Creates a stage to share files between the runner and pipeline jobs, and optionally store the initial vault content.

Step 3 - Create the Service

Run the following SQL script to create the DataOps runner service for Snowpark Container Services using the same SQL variables you set in step 2.

USE ROLE IDENTIFIER($runner_role_name);
USE DATABASE DATAOPS_RUNNER_DB;
USE SCHEMA IDENTIFIER($runner_schema_name);
CREATE SERVICE IF NOT EXISTS IDENTIFIER($runner_service_name)
IN COMPUTE POOL "DATAOPS_RUNNER_COMPUTE_POOL"
FROM SPECIFICATION_TEMPLATE $$
spec:
containers:
- name: runner
image: "/DATAOPS_RUNNER_DB/IMAGES/RUNNER_REPOSITORY/dataops-spcs-runner:latest"
env:
SNOWFLAKE_ACCOUNT_NAME: {{account_name}}
SNOWFLAKE_WAREHOUSE: "DATAOPS_RUNNER_WAREHOUSE"
SNOWFLAKE_COMPUTE_POOL: "DATAOPS_RUNNER_COMPUTE_POOL"
SNOWFLAKE_EXTERNAL_ACCESS_INTEGRATION: "DATAOPS_ALLOW_ALL_INTEGRATION"
DATAOPS_URL: "https://app.dataops.live/"
REGISTRATION_TOKEN: {{registration_token}}
AGENT_NAME: {{agent_name}}
AGENT_TAG: {{agent_tag}}
IMAGE_REPOSITORY_PATH: "DATAOPS_RUNNER_DB/IMAGES/RUNNER_REPOSITORY"
LOG_LEVEL: "INFO"
ALLOWED_IMAGES: ""
SKIP_STARTUP_IMAGE_SYNC: "1"
secrets:
- snowflakeSecret: "IMAGES.DATAOPS_RUNNER_USER_SECRET"
secretKeyRef: "username"
envVarName: "SNOWFLAKE_USER"
- snowflakeSecret: "IMAGES.DATAOPS_RUNNER_USER_SECRET"
secretKeyRef: "password"
envVarName: "SNOWFLAKE_PASSWORD"
volumeMounts:
- name: orchestratorvolumes
mountPath: /orchestrator_volumes/
volumes:
- name: orchestratorvolumes
source: "@ORCHESTRATOR_VOLUMES"
$$
USING (
account_name=>$account_name,
registration_token=>$runner_registration_token,
agent_name=>$runner_agent_name,
agent_tag=>$runner_agent_tag
)
EXTERNAL_ACCESS_INTEGRATIONS = ("DATAOPS_ALLOW_ALL_INTEGRATION")
MIN_INSTANCES=1
MAX_INSTANCES=1;

Notes:

  • Creates a service in the schema created in the previous step.
  • Environment variables are passed to the runner service to configure the runner.
    • SNOWFLAKE_ACCOUNT_NAME is the name of your Snowflake account in <orgname>-<account-name> format.
    • SNOWFLAKE_WAREHOUSE is the warehouse created in the setup steps. Default DATAOPS_RUNNER_WAREHOUSE
    • SNOWFLAKE_COMPUTE_POOL is the compute pool created in the setup steps. Default DATAOPS_RUNNER_COMPUTE_POOL
    • SNOWFLAKE_EXTERNAL_ACCESS_INTEGRATION is the integration created in the Setup steps.
    • DATAOPS_URL is the URL of the DataOps.live platform.
    • REGISTRATION_TOKEN is the token you obtained from the DataOps.live platform.
    • AGENT_NAME is the name to give to this runner. It must be a unique name across your environments. Choose a name for your environment.
    • AGENT_TAG is the tag(s) to give to this runner. It can be a comma separated list.
    • IMAGE_REPOSITORY_PATH is the path to the image repository created in the setup steps.
    • LOG_LEVEL is the log level for the runner.
    • ALLOWED_IMAGES is a comma-separated list of allowed images for the runner to run. For example if you only want to allow DataOps Orchestrator images you can set this to dataopslive*. An empty string with the value "" allows all images.
    • SKIP_STARTUP_IMAGE_SYNC is a flag to skip the initial image sync. On startup the runner service does an initial sync of all available DataOps orchestrators with the production tag 5-stable. The runner will not be available to run jobs until the initial sync is complete. Set to "1" to skip the initial sync (recommended).
  • The secrets section is used to pass the user secret created earlier containing the credentials that allow the runner service to interact with the image registry to ensure the images specified in your DataOps jobs are available.

Once the service is created, the runner will start. After the initial image sync, the runner will register with the DataOps.live platform, if not skipped. To confirm, check the runner status on the platform. In your group, go to CI/CD → Runners.

info

If the compute pool has to start, the runner service will be in a PENDING state until the compute pool becomes ACTIVE. You can check the status of the service by running the SQL query documented here.

Initial image sync for production orchestrators

When the runner first starts, it will sync all available DataOps orchestrators with the tag 5-stable. This process can take some time (up to an hour). If you want to skip this initial sync, you can set the SKIP_STARTUP_IMAGE_SYNC environment variable to "1" in the service creation script. Regardless of whether the initial sync in skipped, there is also background process which keeps the 5-stable tagged Orchestrator images up-to-date. Additionally, any images that are not yet available when a job starts will be synced during the job's execution. Once an image has been synced once it will be available for all jobs to run.

Monitoring the DataOps Runner for Snowpark Container Services

View runner service status

See the status of the runner service by running the following SQL query:

USE DATABASE DATAOPS_RUNNER_DB;
USE SCHEMA IDENTIFIER($runner_schema_name);
SHOW SERVICE CONTAINERS IN SERVICE IDENTIFIER($runner_service_name);

View runner service logs

See the logs of the runner service by running the following SQL query:

USE DATABASE DATAOPS_RUNNER_DB;
USE SCHEMA IDENTIFIER($runner_schema_name);
SELECT value AS log_line
FROM TABLE(
SPLIT_TO_TABLE(SYSTEM$GET_SERVICE_LOGS($runner_service_name, 0, 'runner', 500), '\n')
);

For more info, see the Snowflake GET_SERVICE_LOGS documentation.

Stop the DataOps runner for Snowpark Container Services

You can stop the runner by dropping the service. To start the runner again you will need to run the create service script again. Note if you do this it will create a new Runner entry in the DataOps.live platform and the old one will eventually show as offline.

To drop the runner service, run the following SQL script:

USE DATABASE DATAOPS_RUNNER_DB;
USE SCHEMA IDENTIFIER($runner_schema_name);
DROP SERVICE IF EXISTS IDENTIFIER($runner_service_name);

Uninstall a single DataOps runner for Snowpark Container Services

danger

This script will remove the runner service and all objects specific to that runner only.

USE ROLE ACCOUNTADMIN;
USE DATABASE DATAOPS_RUNNER_DB;
USE SCHEMA IDENTIFIER($runner_schema_name);
DROP SERVICE IF EXISTS IDENTIFIER($runner_service_name);
DROP SCHEMA IF EXISTS IDENTIFIER($runner_schema_name);
DROP ROLE IF EXISTS IDENTIFIER($runner_role_name);

Uninstall all DataOps runner for Snowpark Container Services objects

danger

This script will remove all objects created in the setup steps.

USE ROLE ACCOUNTADMIN;
DROP EXTERNAL ACCESS INTEGRATION IF EXISTS DATAOPS_ALLOW_ALL_INTEGRATION;
DROP DATABASE IF EXISTS DATAOPS_RUNNER_DB;
DROP USER IF EXISTS DATAOPS_RUNNER_USER;
DROP COMPUTE POOL IF EXISTS DATAOPS_RUNNER_COMPUTE_POOL;
DROP WAREHOUSE IF EXISTS DATAOPS_RUNNER_WAREHOUSE;
DROP ROLE IF EXISTS DATAOPS_RUNNER_IMAGES_ROLE;
DROP ROLE IF EXISTS DATAOPS_RUNNER_ADMIN_ROLE;