Skip to main content

Setting Up DataOps.live

Overview

The implementation of DataOps.live involves different planes in which both DataOps.live and customers collaborate to establish a suitable infrastructure and use the data product platform to develop, branch, and deploy code and data.

  • The data plane handles the actual processing and manipulation of data.
  • The control plane coordinates and orchestrates data operations.
  • The management plane oversees the administration and governance of the DataOps environment.

These three planes work together to enable effective data management, processing, and operations within a DataOps framework.

The three main plans in DataOps methodology !!shadow!!

Setting up the DataOps.live infrastructure involves the following key steps:

Step 1 - Set up DataOps.live account

  • The DataOps.live team sets up the top-level DataOps group.

  • Your first project is either done by the DataOps.live team or can be done by you in less than a minute.

  • Once the project is created:

    • Edit pipelines/includes/config/variables.yml and set:

      • DATAOPS_PREFIX to a prefix defining the scope of your project that will serve as a prefix for most objects inside Snowflake. For a demo, you can leave the default value DATAOPS.
        warning

        Supported characters in DATAOPS_PREFIX are letters (A-Z), decimal digits ( 0-9), and an underscore (_). If lowercase letters are used, SOLE adds a prefix and suffix to the value of the variable DATAOPS_DATABASE available at pipeline run time to create a default database with an incorrect name.

      • DATAOPS_VAULT_KEY to a long random string
    • Edit pipelines/includes/config/agent_tag.yml and set the tag to something we will use later when setting up the DataOps Runner e.g.

      pipelines/includes/config/agent_tag.yml
      .agent_tag:
      tags:
      - dataops-production-runner
  • Define additional users for your project. You will need to provide a list of names and email addresses to DataOps.live.

Step 2 - Set up Snowflake account

You will need to create a Snowflake instance to use the data product platform and start managing your Snowflake data and Snowflake data environments. See Setting Up Snowflake for information about using SQL to set up a Snowflake account with the role, warehouse, and master user properties.

Step 3 - Populate secrets

To connect to Snowflake, you need to set four DataOps Vault keys. The DataOps Secrets Manager Orchestrator is fully documented in this section. This guide assumes that:

  • If using AWS, the DataOps runner uses an IAM role attached to your EC2 instance, which has the relevant access to read the keys from the secrets manager or the SSM parameter store.
  • If using Azure, the DataOps runner uses a service principal attached to your Azure VM which has the relevant access to read the keys from the KeyVault.

Using AWS secrets manager

If you are using AWS Secrets Manager, your configuration should look something like this (with your details substituted):

AWS Secrets Manager

SNOWFLAKE.SOLE.ACCOUNT
SNOWFLAKE.SOLE.PASSWORD
SNOWFLAKE.SOLE.USERNAME
SNOWFLAKE.SOLE.ROLE

and in your pipelines/includes/config/variables.yml include:

pipelines/includes/config/variables.yml
variables:
...
SECRETS_SELECTION: <the name of your secret>
SECRETS_AWS_REGION: <the AWS region for your secret>

Using AWS SSM parameter store

If you are using AWS SSM Parameter Store, your configuration should look something like this (with your details substituted):

AWS Parameter Store

/dataops/SNOWFLAKE/SOLE/ACCOUNT
/dataops/SNOWFLAKE/SOLE/PASSWORD
/dataops/SNOWFLAKE/SOLE/ROLE
/dataops/SNOWFLAKE/SOLE/USERNAME

and in your pipelines/includes/config/variables.yml include:

pipelines/includes/config/variables.yml
variables:
...
SECRETS_MANAGER: AWS_PARAMETER_STORE
SECRETS_SELECTION: /dataops/SNOWFLAKE/SOLE/
SECRETS_STRIP_PREFIX: /dataops/

Using Azure KeyVault

If you are using Azure KeyVault, your configuration should look something like this (with your details substituted):

Azure KeyVault

SNOWFLAKE-SOLE-ROLE
SNOWFLAKE-SOLE-PASSWORD
SNOWFLAKE-SOLE-USERNAME
SNOWFLAKE-SOLE-ACCOUNT

and in your pipelines/includes/config/variables.yml include:

pipelines/includes/config/variables.yml
variables:
...
SECRETS_MANAGER: AZURE_KEY_VAULT
SECRETS_AZURE_KEY_VAULT_URL: https://KEY_VAULT_NAME.vault.azure.net/

Step 4 - Set up a runner on AWS EC2 or Azure VM

Follow the detailed instructions on how to set up a runner in the DataOps Administration.

Step 5 - Test your project

At this point, you should be able to run the project using the full-ci.yml pipeline. To run your first pipeline, hover over "CI/CD" in the left-hand navigation, and select "Pipelines" from the sub-menu that appears. The pipeline page will appear. Click on the "Run Pipeline" button in the upper right of this page to bring up the run pipeline page. On the run pipeline page, change the Pipeline Type dropdown to "full-ci.yml", and then click on the "Run Pipeline" button at the bottom of the form. This will start the pipeline.

Step 6 - Create development environments

At this point, you should have a fully working DataOps pipeline. The Snowflake Object Lifecycle Engine (SOLE) will have created some base infrastructure. Some other things you can try now:

  • Create a qa branch from main and run the full-ci.yml pipeline in this branch. This creates a QA environment in Snowflake.
  • Create a dev branch from main and run the full-ci.yml pipeline in this branch. This creates a Dev environment in Snowflake. The DEV database will be created from the PROD database using zero-copy clones.
  • Create a my-first-feature branch from dev and run the full-ci.yml pipeline in this branch. This creates a feature branch environment in Snowflake. The DEV database will be created from the PROD database using zero-copy clones.

See DataOps Environments for more information on the environments used with DataOps.live, and DataOps Sample Development Workflow for a usage example.