Setting Up DataOps
Overview
Step 1 - Set up DataOps account
The setup of the top-level DataOps group is done by the DataOps.live team.
Your first project is either done by the DataOps.live team or can be done by you in less than a minute.
Once the project is created:
Edit
pipelines/includes/config/variables.yml
and set:DATAOPS_PREFIX
to a prefix defining the scope of your project that will serve as a prefix for most objects inside Snowflake. For a PoC or demo, you can leave the default valueDATAOPS
.cautionSupported characters in
DATAOPS_PREFIX
are letters (A-Z), decimal digits ( 0-9), and an underscore (_). If lowercase letters are used, SOLE adds a prefix and suffix to the value of the variableDATAOPS_DATABASE
available at pipeline run time to create a default database with an incorrect name.DATAOPS_VAULT_KEY
to a long random string
Edit
pipelines/includes/config/agent_tag.yml
and set the tag to something we will use later when setting up the DataOps Runner e.g.pipelines/includes/config/agent_tag.yml.agent_tag:
tags:
- dataops-production-runner
Define additional users for your project. You will need to provide a list of names and email addresses to DataOps.live.
Step 2 - Create Snowflake instance
You will need a Snowflake account to use DataOps. This will be your main Snowflake account (or accounts) for production. However, for PoC purposes, it can often be quicker to create a disposable account at signup.snowflake.com. A user with the ACCOUNTADMIN
role is needed to run the setup SQL. For the trial account, adjust the below SQL:
Don't forget to set your password in the PASSWORD field in the below script before running it.
---- ROLES ----
-- Admin role (DATAOPS_SOLE_ADMIN_ROLE)
USE ROLE SECURITYADMIN;
CREATE OR REPLACE ROLE DATAOPS_SOLE_ADMIN_ROLE;
USE ROLE ACCOUNTADMIN;
GRANT
CREATE DATABASE, -- CREATEs needed for SOLE object creation and management
CREATE USER,
CREATE ROLE,
CREATE WAREHOUSE,
CREATE SHARE,
CREATE INTEGRATION,
CREATE NETWORK POLICY,
MANAGE GRANTS -- MANAGE GRANTS needed to allow SOLE to manage users (specifically so it can SHOW USERS internally)
ON ACCOUNT TO ROLE DATAOPS_SOLE_ADMIN_ROLE;
GRANT ROLE DATAOPS_SOLE_ADMIN_ROLE TO ROLE SYSADMIN; -- or to the most appropriate parent role
---- WAREHOUSES ----
CREATE WAREHOUSE DATAOPS_SOLE_ADMIN_WAREHOUSE WITH WAREHOUSE_SIZE='X-SMALL';
GRANT MONITOR, OPERATE, USAGE ON WAREHOUSE DATAOPS_SOLE_ADMIN_WAREHOUSE TO ROLE DATAOPS_SOLE_ADMIN_ROLE;
---- USERS ----
-- Master user
USE ROLE USERADMIN;
CREATE OR REPLACE USER DATAOPS_SOLE_ADMIN
PASSWORD = '' -- Add a secure password here, please!
MUST_CHANGE_PASSWORD = FALSE
DISPLAY_NAME = 'DataOps SOLE User'
DEFAULT_WAREHOUSE = DATAOPS_SOLE_ADMIN_WAREHOUSE
DEFAULT_ROLE = DATAOPS_SOLE_ADMIN_ROLE;
USE ROLE SECURITYADMIN;
GRANT ROLE DATAOPS_SOLE_ADMIN_ROLE TO USER DATAOPS_SOLE_ADMIN;
GRANT ROLE ACCOUNTADMIN TO USER DATAOPS_SOLE_ADMIN; -- Needed for creating resource monitors
If you want to use an existing Snowflake account, see Privileges for a Fresh Environment and Privileges to Manage Preexisting Objects.
Step 3 - Populate secrets
To connect to Snowflake, you need to set four DataOps Vault keys. The DataOps Secrets Manager Orchestrator is fully documented in this section. This guide assumes that:
- If using AWS, the DataOps runner uses an IAM role attached to your EC2 instance, which has the relevant access to read the keys from the secrets manager or the SSM parameter store.
- If using Azure, the DataOps runner uses a service principal attached to your Azure VM which has the relevant access to read the keys from the KeyVault.
Using AWS secrets manager
If you are using AWS Secrets Manager, your configuration should look something like this (with your details substituted):
SNOWFLAKE.SOLE.ACCOUNT
SNOWFLAKE.SOLE.PASSWORD
SNOWFLAKE.SOLE.USERNAME
SNOWFLAKE.SOLE.ROLE
and in your pipelines/includes/config/variables.yml
include:
variables:
...
SECRETS_SELECTION: <the name of your secret>
SECRETS_AWS_REGION: <the AWS region for your secret>
Using AWS SSM parameter store
If you are using AWS SSM Parameter Store, your configuration should look something like this (with your details substituted):
/dataops/SNOWFLAKE/SOLE/ACCOUNT
/dataops/SNOWFLAKE/SOLE/PASSWORD
/dataops/SNOWFLAKE/SOLE/ROLE
/dataops/SNOWFLAKE/SOLE/USERNAME
and in your pipelines/includes/config/variables.yml
include:
variables:
...
SECRETS_MANAGER: AWS_PARAMETER_STORE
SECRETS_SELECTION: /dataops/SNOWFLAKE/SOLE/
SECRETS_STRIP_PREFIX: /dataops/
Using Azure KeyVault
If you are using Azure KeyVault, your configuration should look something like this (with your details substituted):
SNOWFLAKE-SOLE-ROLE
SNOWFLAKE-SOLE-PASSWORD
SNOWFLAKE-SOLE-USERNAME
SNOWFLAKE-SOLE-ACCOUNT
and in your pipelines/includes/config/variables.yml
include:
variables:
...
SECRETS_MANAGER: AZURE_KEY_VAULT
SECRETS_AZURE_KEY_VAULT_URL: https://KEY_VAULT_NAME.vault.azure.net/
Step 4 - Set up DataOps runner on AWS EC2 or Azure VM
Follow the detailed instructions on how to set up a runner in the DataOps Administration.
Step 5 - Test your project
At this point, you should be able to run the project using the full-ci.yml
pipeline. To run your first pipeline, hover over "CI/CD" in the left-hand navigation, and select "Pipelines" from the sub-menu that appears. The pipeline page will appear. Click on the "Run Pipeline" button in the upper right of this page to bring up the run pipeline page. On the run pipeline page, change the Pipeline Type dropdown to "full-ci.yml", and then click on the "Run Pipeline" button at the bottom of the form. This will start the pipeline.
Step 6 - Specific scope
At this point, you should have a fully working DataOps pipeline. The Snowflake Object Lifecycle Engine will have created some base infrastructure. Some other things you can try now:
- Create a
qa
branch frommain
and run thefull-ci.yml
pipeline in this branch. This will create a QA environment in Snowflake. - Create a
dev
branch frommain
and run thefull-ci.yml
pipeline in this branch. This will create a Dev environment in Snowflake. The DEV database will be created from the PROD database using Zero Copy Clone. - Create a
my-first-feature
branch fromdev
and run thefull-ci.yml
pipeline in this branch. This will create a Feature Branch environment in Snowflake. The DEV database will be created from the PROD database using Zero Copy Clone.