Skip to main content

Experimental Developer Use Cases

Feature release status badge: PriPrev
PriPrev

danger

This page contains ideas about how to use the DataOps CDE to achieve more advanced use cases. Code and examples here are subject to change at any time and are very likely 'not the best way' to achieve this use case in the long term. However, in line with our philosophies of transparency, we are putting these out to collect feedback and other ideas from the community. Partner with us to make these better!

SOLE compilation and validation

It's virtually impossible to test SOLE without pointing it at a full Snowflake Tenant and giving it full SOLE access credentials. While both are possible, we still recommend the best way to test SOLE configurations is to run them as part of a pipeline. However, a modification to a SOLE configuration often includes a simple formatting error that isn't discovered for several minutes until the SOLE job runs in the pipeline. The goal here is to provide the ability to validate a SOLE configuration locally. In the future, we will build the same SOLE compilation libraries into the DataOps CDE. Still, in advance of this, we can use one of the most powerful features of the DataOps CDE - its ability to run Docker containers... yes, that's right, inside a cloud-based, containerized CDE, you can run containers... it's turtles all the way down.

Let's try this out in a simple way first. In your DataOps CDE terminal, just run docker run hello-world:

That was easy! Running a DataOps Orchestrator is a little more complicated as it expects to have several things in place, like a DataOps Vault. However, for local testing, you can mock most of this with environment variables. You can create a script in your repository scripts/orchestrated_sole_run.sh (or anywhere else you like) that looks like this:

#!/bin/bash
set -e

# Set all base environment variables and profiles
/dataops-cde/scripts/dataops_cde_init.sh

# turns /workspace/xxx into /build/project/xxx
export DATAOPS_REPO_ROOT=$(sed 's~/workspace~/build/project~g'<<<$GITPOD_REPO_ROOT)

# Set the default actions; see https://docs.dataops.live/docs/orchestration/snowflakeobjectlifecycle-orchestrator/#lifecycle_action for more options
# some other quick execution options:
# DISABLE_PERSISTENT_CACHE=1 LIFECYCLE_ACTION=VALIDATE LIFECYCLE_MANAGE_OBJECT=DATABASE scripts/orchestrated_sole_run.sh
# DISABLE_PERSISTENT_CACHE=1 LIFECYCLE_ACTION=VALIDATE LIFECYCLE_MANAGE_OBJECT=DATABASE_LEVEL scripts/orchestrated_sole_run.sh
# DISABLE_PERSISTENT_CACHE=1 LIFECYCLE_ACTION=VALIDATE LIFECYCLE_MANAGE_OBJECT=ACCOUNT_LEVEL scripts/orchestrated_sole_run.sh
export LIFECYCLE_ACTION="${LIFECYCLE_ACTION:-COMPILE}"
export LIFECYCLE_MANAGE_OBJECT="${LIFECYCLE_MANAGE_OBJECT:-ACCOUNT_LEVEL}"

# Allow a tiny temporary vault to be built
export DATAOPS_VAULT_KEY=$RANDOM # anything will do here
export DATAOPS_VAULT_SALT_FILE=/etc/hosts # anything will do here
export DATAOPS_TEMPLATES_DIR=/tmp/local_config
export DATAOPS_SECONDARY_TEMPLATES_DIR=$DATAOPS_REPO_ROOT/vault-content
export CI_JOB_ID=manual
export REPORT_DIR=/tmp
export RUNNER=manual

# Set variables for SOLE to connect.
# These can be lower privileges than the regular SOLE role since we aren't making any changes to Snowflake.
export DATAOPS_SOLE_ACCOUNT=$DBT_ENV_SECRET_ACCOUNT
export DATAOPS_SOLE_USERNAME=$DBT_ENV_SECRET_USER
export DATAOPS_SOLE_PASSWORD=$DBT_ENV_SECRET_PASSWORD
export DATAOPS_SOLE_ROLE=$DBT_ENV_ROLE
export DATAOPS_SOLE_WAREHOUSE=$DBT_ENV_WAREHOUSE
export CI_PROJECT_DIR=$DATAOPS_REPO_ROOT
export CONFIGURATION_DIR=$DATAOPS_REPO_ROOT/dataops/snowflake

# Store all the variables in a file to pass into the Orchestrator
printenv | egrep "DBT|DATAOPS|TRANSFORM|CI|REPORT|LIFECYCLE_|CONFIGURATION_DIR|DISABLE_PERSISTENT_CACHE|RUNNER" > /tmp/cde.env

# Run the orchestrator itself
docker run -it --env-file=/tmp/cde.env -v /home/gitpod/.dbt:/tmp/local_config -v $GITPOD_REPO_ROOT/:$DATAOPS_REPO_ROOT dataopslive/dataops-snowflakeobjectlifecycle-orchestrator:5-stable bash -c "rm -f /teardown-scripts/90* && /dataops"

Don't forget to make this script executable with:

chmod +x scripts/orchestrated_sole_run.sh

This also requires a local /workspace/truedataops-22/vault-content/vault.template.yml that includes a SOLE section similar to:

SNOWFLAKE:
ACCOUNT: "{{ env.DBT_ENV_SECRET_ACCOUNT }}"
MASTER:
USERNAME: "{{ env.DBT_ENV_SECRET_USER }}"
PASSWORD: "{{ env.DBT_ENV_SECRET_PASSWORD }}"
ROLE: "{{ env.DBT_ENV_ROLE }}"
TRANSFORM:
USERNAME: "{{ env.DBT_ENV_SECRET_USER }}"
PASSWORD: "{{ env.DBT_ENV_SECRET_PASSWORD }}"
ROLE: "{{ env.DBT_ENV_ROLE }}"
WAREHOUSE: "{{ env.DBT_ENV_SECRET_WAREHOUSE }}"
THREADS: 16
INGESTION:
USERNAME: "{{ env.DBT_ENV_SECRET_USER }}"
PASSWORD: "{{ env.DBT_ENV_SECRET_PASSWORD }}"
ROLE: "{{ env.DBT_ENV_ROLE }}"
WAREHOUSE: "{{ env.DBT_ENV_SECRET_WAREHOUSE }}"
THREADS: 16
SOLE:
ACCOUNT: "{{ env.DBT_ENV_SECRET_ACCOUNT }}"
USERNAME: "{{ env.DBT_ENV_SECRET_USER }}"
PASSWORD: "{{ env.DBT_ENV_SECRET_PASSWORD }}"
ROLE: "{{ env.DBT_ENV_ROLE }}"

Once you have these two in place, you can now run:

scripts/orchestrated_sole_run.sh

The output of this will look very familiar - since it's the same Orchestrator being used when SOLE runs in a pipeline:

You can see that any SOLE compilation errors would show up very quickly.

By overriding a couple of the variables, you can get SOLE to do deeper configuration VALIDATION e.g.

DISABLE_PERSISTENT_CACHE=1 LIFECYCLE_ACTION=VALIDATE LIFECYCLE_MANAGE_OBJECT=DATABASE scripts/orchestrated_sole_run.sh
DISABLE_PERSISTENT_CACHE=1 LIFECYCLE_ACTION=VALIDATE LIFECYCLE_MANAGE_OBJECT=DATABASE_LEVEL scripts/orchestrated_sole_run.sh
DISABLE_PERSISTENT_CACHE=1 LIFECYCLE_ACTION=VALIDATE LIFECYCLE_MANAGE_OBJECT=ACCOUNT_LEVEL scripts/orchestrated_sole_run.sh

For example: