Skip to main content

DataOps Runner for Snowpark Container Services Installation

Feature release status badge: PubPrev
PubPrev

Snowflake Account Setup

Before you can install the runner, you need to create some objects in Snowflake to enable the DataOps runner to operate within Snowpark Container Services.

Step 1 - Create the account-level Snowflake objects

Using the ACCOUNTADMIN role, execute the following script. The object names in the script are configurable and can be changed to suit your naming conventions.

Snowflake objects at account level
USE ROLE ACCOUNTADMIN;

CREATE ROLE IF NOT EXISTS DATAOPS_RUNNER_ROLE;
GRANT ROLE DATAOPS_RUNNER_ROLE TO ROLE ACCOUNTADMIN;

CREATE DATABASE IF NOT EXISTS DATAOPS_RUNNER_DB;
GRANT OWNERSHIP ON DATABASE DATAOPS_RUNNER_DB TO ROLE DATAOPS_RUNNER_ROLE COPY CURRENT GRANTS;

CREATE OR REPLACE WAREHOUSE DATAOPS_RUNNER_WAREHOUSE
WITH WAREHOUSE_SIZE='X-SMALL';
GRANT USAGE ON WAREHOUSE DATAOPS_RUNNER_WAREHOUSE TO ROLE DATAOPS_RUNNER_ROLE;

CREATE COMPUTE POOL IF NOT EXISTS DATAOPS_RUNNER_COMPUTE_POOL
MIN_NODES = 1
MAX_NODES = 1
INSTANCE_FAMILY = CPU_X64_S;
GRANT USAGE, MONITOR ON COMPUTE POOL DATAOPS_RUNNER_COMPUTE_POOL TO ROLE DATAOPS_RUNNER_ROLE;
GRANT OWNERSHIP ON COMPUTE POOL DATAOPS_RUNNER_COMPUTE_POOL TO ROLE DATAOPS_RUNNER_ROLE
COPY CURRENT GRANTS;

CREATE USER IF NOT EXISTS DATAOPS_RUNNER_USER TYPE = 'LEGACY_SERVICE' PASSWORD = 'change-me-123';
GRANT ROLE DATAOPS_RUNNER_ROLE TO USER DATAOPS_RUNNER_USER;

Notes:

  • Creates a role DATAOPS_RUNNER_ROLE required to deploy the DataOps runner to Snowpark Container Services.
  • Creates a dedicated database for all the DataOps runners named DATAOPS_RUNNER_DB
  • Creates a single compute pool name DATAOPS_RUNNER_COMPUTE_POOL for the runner service to run on.
    • The instance type CPU_X64_S is our current recommended size. See working with compute pools for more detail.
    • The same compute pool will be reused for all on-demand orchestrators runs during pipeline executing
  • Creates a user named DATAOPS_RUNNER_USER used by the runner service to synchronize orchestrator container images. This user is not used for anything else.
multiple runners

If you want to set up multiple runners, we recommend to continue to use a single database DATAOPS_RUNNER_DB and in the next step use different schemas - one per runner - to isolate them from each other. For more information, see the multiple runners section in our documentation.

optimizing Snowflake credit spent

Contact us if you would like to discuss a compute pool setup with two compute pools - one for the long-running runner and one for the on-demand orchestrators.

Step 2 - Create the database-level Snowflake objects

Snowflake objects at database level
USE ROLE DATAOPS_RUNNER_ROLE;
USE DATABASE DATAOPS_RUNNER_DB;
USE WAREHOUSE DATAOPS_RUNNER_WAREHOUSE;

CREATE SCHEMA IF NOT EXISTS DATA_SCHEMA;
USE SCHEMA DATA_SCHEMA;
CREATE IMAGE REPOSITORY IF NOT EXISTS RUNNER_REPOSITORY;
CREATE STAGE IF NOT EXISTS EXEC_VOLUMES ENCRYPTION=(TYPE='SNOWFLAKE_SSE');
CREATE STAGE IF NOT EXISTS ORCHESTRATOR_VOLUMES ENCRYPTION=(TYPE='SNOWFLAKE_SSE');

CREATE OR REPLACE SECRET DATAOPS_RUNNER_USER_SECRET TYPE = password
USERNAME = 'DATAOPS_RUNNER_USER'
PASSWORD = 'change-me-123';

Notes:

  • Creates a schema named DATA_SCHEMA to hold all further objects for a single runner.
    • Note: change the name when setting up multiple runners and change it throughout the entire document
  • Creates a Snowflake image repository named RUNNER_REPOSITORY to store the DataOps runner and DataOps orchestrators container images.
    • as the repository is a schema object, we don't use the DATAOPS_ prefix
  • Creates Snowflake stages to share files between the runner and pipeline jobs, and optionally store the initial vault content.
  • Creates a Snowflake secret named DATAOPS_RUNNER_USER_SECRET for the runner to access the image repository.

Step 3 - Create an access integration

The DataOps Runner for Snowpark Container Services uses the default host to connect to Snowflake. This requires external access integration with a network rule allowing access from your service to any external services (including the DataOps.live platform).

CREATE NETWORK RULE IF NOT EXISTS ALLOW_ALL_RULE
TYPE = 'HOST_PORT'
MODE= 'EGRESS'
VALUE_LIST = ('0.0.0.0:443','0.0.0.0:80');

USE ROLE ACCOUNTADMIN;
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION DATAOPS_ALLOW_ALL_INTEGRATION
ALLOWED_NETWORK_RULES = (ALLOW_ALL_RULE)
ENABLED = true;
GRANT OWNERSHIP ON INTEGRATION DATAOPS_ALLOW_ALL_INTEGRATION TO ROLE DATAOPS_RUNNER_ROLE
COPY CURRENT GRANTS;

Notes:

  • Creates a network rule named ALLOW_ALL_RULE.
    • The VALUE_LIST in the network rule is set to allow all outbound traffic. You can restrict this to specific IP addresses or ranges, or domains if required.
    • See Restrictive Network Rule for the minimum required domains.
  • Creates an external access integration named DATAOPS_ALLOW_ALL_INTEGRATION.

Push the DataOps Runner image for Snowpark Container Services to your Snowflake account

You will need to upload the DataOps Runner image for Snowpark Container Services to your Snowflake account's image repository, to make it available for starting a runner service. These steps are required for the first-time setup and whenever you need to upgrade the DataOps Runner for Snowpark Container Services.

Step 1 - Pull the DataOps Runner image for Snowpark Container Services

On a machine with Docker installed, run the following commands to pull the DataOps Runner image from the DataOps.live Docker registry.

docker login --username dataopsread --password dckr_pat_82FQ4O6N4yb6fXJc15kIvX4Qrtg
docker pull dataopslive/dataops-spcs-runner:latest

Step 2 - Push the image to your image repository

Now tag and push the image to your Snowflake image repository.

First set the environment variables for your Snowflake account and the image repository path.

# The name of your Snowflake account
export SNOWFLAKE_ACCOUNT="<account_name>"
# The location of the image repository created during the Setup steps
# In the format "<database>/<schema>/<image-repository>"
export IMAGE_REPOSITORY_PATH="dataops_runner_db/data_schema/runner_repository" # Must be lowercase
export IMAGE_REPOSITORY=$SNOWFLAKE_ACCOUNT.registry.snowflakecomputing.com/$IMAGE_REPOSITORY_PATH
note

If you changed the database or schema name in previous steps, make sure to reflect them here and convert the names to lowercase.

Login to the Snowflake image registry

You can authenticate to the Snowflake image registry using a username and password or other methods.

Username password login
export SNOWFLAKE_USER="DATAOPS_RUNNER_USER" # The user created during the Setup steps
export SNOWFLAKE_PASSWORD="change-me-123" # The password set during the Setup steps

Note that you must use the same password as you did in the previous steps.

docker login $IMAGE_REPOSITORY -u $SNOWFLAKE_USER -p $SNOWFLAKE_PASSWORD
Other authentication methods

For the other ways of authenticating to the Snowflake image registry, see the Snowflake CLI documentation.

If you want to push the DataOps Runner image to your Snowflake account as your currently logged in user, make sure you have the right permissions to push the image to the image repository. For example:

GRANT ROLE DATAOPS_RUNNER_ROLE TO USER <your-logged-in-username>

Tag and push the image

docker tag dataopslive/dataops-spcs-runner:latest $IMAGE_REPOSITORY/dataops-spcs-runner:latest
docker push $IMAGE_REPOSITORY/dataops-spcs-runner:latest

Start the DataOps runner for Snowpark Container Services

Step 1 - Fetch registration tokens from the platform

The registration token is generated automatically in DataOps.live and is used to link together the runner you are about to create with your specific DataOps project or group.

note

Runner registration tokens are scoped to either a top-level group, sub-group, or project.

Follow these steps to obtain your registration token:

  1. Connect to the data product platform.

  2. Open the group (preferred) or project you want to create the runner for.

  3. At the group level, follow the below steps:

    1. Click CI/CD → Runners. Choosing the group makes the runner available to all projects in that group.

    2. Expand Register a group runner on the top right and copy the registration token.

      Group runner token !!shadow!!

  4. At the project level, follow the below steps:

    1. Click Settings → CI/CD.
    2. Find the Runners section and click Expand.
    3. Copy the registration token from inside the Project runners section under Set up a project runner for a project.

    Project runner token !!shadow!!

Step 2 - Create the DataOps runner service for Snowpark Container Services

Run the following SQL script to create the DataOps runner service for Snowpark Container Services named DATAOPS_RUNNER_SERVICE.

Update the placeholders with your details and change any defaults with the values you have used in the previous steps.

USE ROLE DATAOPS_RUNNER_ROLE;
USE DATABASE DATAOPS_RUNNER_DB;
USE SCHEMA DATA_SCHEMA;

CREATE SERVICE IF NOT EXISTS "DATAOPS_RUNNER_SERVICE"
IN COMPUTE POOL "DATAOPS_RUNNER_COMPUTE_POOL"
FROM SPECIFICATION $$
spec:
containers:
- name: runner
image: "/DATAOPS_RUNNER_DB/DATA_SCHEMA/RUNNER_REPOSITORY/dataops-spcs-runner:latest"
env:
SNOWFLAKE_ACCOUNT: "<account_name>"
SNOWFLAKE_ROLE: "DATAOPS_RUNNER_ROLE"
SNOWFLAKE_WAREHOUSE: "DATAOPS_RUNNER_WAREHOUSE"
SNOWFLAKE_COMPUTE_POOL: "DATAOPS_RUNNER_COMPUTE_POOL"
SNOWFLAKE_EXTERNAL_ACCESS_INTEGRATION: "DATAOPS_ALLOW_ALL_INTEGRATION"
DATAOPS_URL: "https://app.dataops.live/"
REGISTRATION_TOKEN: "<your-registration-token>"
AGENT_NAME: "<dataops-spcs-runner-env>"
AGENT_TAG: "<your-dataops-runner-tag>"
IMAGE_REPOSITORY_PATH: "DATAOPS_RUNNER_DB/DATA_SCHEMA/RUNNER_REPOSITORY"
LOG_LEVEL: "INFO"
ALLOWED_IMAGES: ""
SKIP_STARTUP_IMAGE_SYNC: "0"
secrets:
- snowflakeSecret: "DATAOPS_RUNNER_USER_SECRET"
secretKeyRef: "username"
envVarName: "SNOWFLAKE_USER"
- snowflakeSecret: "DATAOPS_RUNNER_USER_SECRET"
secretKeyRef: "password"
envVarName: "SNOWFLAKE_PASSWORD"
volumeMounts:
- name: execserver
mountPath: /execserver/
volumes:
- name: execserver
source: "@EXEC_VOLUMES/execserver"
$$
EXTERNAL_ACCESS_INTEGRATIONS = ("DATAOPS_ALLOW_ALL_INTEGRATION")
MIN_INSTANCES=1
MAX_INSTANCES=1;
variables to change

Change at least the following: SNOWFLAKE_ACCOUNT, REGISTRATION_TOKEN, AGENT_NAME, and AGENT_TAG.

Notes:

  • Creates a service named DATAOPS_RUNNER_SERVICE in the schema DATAOPS_RUNNER_DB.DATA_SCHEMA.
  • Environment variables are passed to the runner service to configure the runner.
    • SNOWFLAKE_ROLE is the role created in the setup steps. Default DATAOPS_RUNNER_ROLE
    • SNOWFLAKE_WAREHOUSE is the warehouse created in the setup steps. Default DATAOPS_RUNNER_WAREHOUSE
    • SNOWFLAKE_COMPUTE_POOL is the compute pool created in the setup steps. Default DATAOPS_RUNNER_COMPUTE_POOL
    • SNOWFLAKE_EXTERNAL_ACCESS_INTEGRATION is the integration created in the Setup steps.
    • DATAOPS_URL is the URL of the DataOps.live platform.
    • REGISTRATION_TOKEN is the token you obtained from the DataOps.live platform. Change this!
    • AGENT_NAME is the name to give to this runner. It must be a unique name across your environments. Choose a name for your environment.
    • AGENT_TAG is the tag(s) to give to this runner. It can be a comma separated list. Change this!
    • IMAGE_REPOSITORY_PATH is the path to the image repository created in the setup steps.
    • LOG_LEVEL is the log level for the runner.
    • ALLOWED_IMAGES is a comma-separated list of allowed images for the runner to run. For example if you only want to allow DataOps Orchestrator images you can set this to dataopslive*. An empty string with the value "" allows all images.
    • SKIP_STARTUP_IMAGE_SYNC is a flag to skip the initial image sync. On startup the runner service does an initial sync of all available DataOps orchestrators with the production tag 5-stable. The runner will not be available to run jobs until the initial sync is complete. Set to "1" to skip the initial sync. Useful when debugging.
  • The secrets section is used to pass the user secret created earlier containing the credentials that allow the runner service to interact with the image registry to ensure the images specified in your DataOps jobs are available.

Once the service is created, the runner will start. After the initial image sync, the runner will register with the DataOps.live platform, if not skipped. To confirm, check the runner status on the platform. In your group, go to CI/CD → Runners.

info

If the compute pool has to start, the runner service will be in a PENDING state until the compute pool becomes ACTIVE. You can check the status of the service by running the SQL query documented here.

Initial image sync for production orchestrators

When the runner first starts, it will sync all available DataOps orchestrators with the tag 5-stable. This process can take some time (up to an hour). If you want to skip this initial sync, you can set the SKIP_STARTUP_IMAGE_SYNC environment variable to "1" in the service creation script. Regardless of whether the initial sync in skipped, there is also background process which keeps the 5-stable tagged Orchestrator images up-to-date. Additionally, any images that are not yet available when a job starts will be synced during the job's execution. Once an image has been synced once it will be available for all jobs to run.

Monitoring the DataOps Runner for Snowpark Container Services

View runner service status

See the status of the runner service by running the following SQL query:

SHOW SERVICE CONTAINERS IN SERVICE DATAOPS_RUNNER_SERVICE;

View runner service logs

See the logs of the runner service by running the following SQL query:

SELECT value AS log_line
FROM TABLE(
SPLIT_TO_TABLE(SYSTEM$GET_SERVICE_LOGS('DATAOPS_RUNNER_SERVICE', 0, 'runner', 500), '\n')
);

For more info, see the Snowflake GET_SERVICE_LOGS documentation.

Stop the DataOps runner for Snowpark Container Services

You can stop the runner by dropping the service. To start the runner again you will need to run the create service script again.

To drop the runner service, run the following SQL script:

DROP SERVICE IF EXISTS DATAOPS_RUNNER_SERVICE;

Uninstall the DataOps runner for Snowpark Container Services

danger

This script will remove the runner service and all objects created in the setup steps.

USE ROLE ACCOUNTADMIN;

DROP SERVICE IF EXISTS DATAOPS_RUNNER_SERVICE;
DROP ROLE IF EXISTS DATAOPS_RUNNER_ROLE;
DROP WAREHOUSE IF EXISTS DATAOPS_RUNNER_WAREHOUSE;
DROP COMPUTE POOL IF EXISTS DATAOPS_RUNNER_COMPUTE_POOL;
DROP USER IF EXISTS DATAOPS_RUNNER_USER;
DROP DATABASE IF EXISTS DATAOPS_RUNNER_DB;
DROP EXTERNAL ACCESS INTEGRATION IF EXISTS DATAOPS_ALLOW_ALL_INTEGRATION;