Skip to main content

Key Developer Use Cases

Feature release status badge: PriPrev
PriPrev

There are many use cases that the DataOps Cloud Development Environment (CDE) optimizes including Transformation and Automated Testing using MATE. The subsequent sections will demo a few example use cases that you can achieve with the development environment.

MATE use cases

Modify an existing MATE model

This example shows how to do simple modifications to an existing DataOps model and preview the results automatically after entering each new SQL line. Let's assume you need to edit a product table to add new columns with specific names and populate them with content.

  1. Under your top-level Projects menu, click the project you want to open.

  2. From the project menu buttons, click DataOps CDE to open the development environment.

  3. Navigate to the project Modelling SQL file and duplicate the statement about the column name.

    The column NAME is duplicated automatically.

  4. Edit the statements and name the first column productcategory and the second productsubcategory.

  5. Add the coalesce function to the column statements to fill the columns with Misc.

    The columns are automatically populated with the new text.

  6. Enter another SQL statement to add the new ListPrice column to the table.

Add a test to an existing MATE model

This example shows how to add a test to a layer in an existing DataOps model and automatically preview the results of running the tests. Let's assume you want to test selecting the data for a specific customer.

  1. Under your top-level Projects menu, click the project you want to open.

  2. From the project menu buttons, click DataOps CDE to open the development environment.

  3. Navigate to the project YAML file and add tests to check for null values in specific columns in the customer table.

  4. Select dbt Power User on the left vertical bar and click the run button on the top right corner of the panel.

    The terminal console shows the results of all the tests defined in the .yml file.

  5. Enter another test condition in the YAML file to test relationships to a field in another table and rerun the test.

Test MATE model changes

This example shows how to check if the changes added to the DataOps model haven't broken anything downstream.

  1. Open the project modeling SQL file and add a test to retrieve the ID of all business entities.

  2. Run the test via the dbt Power User icon.

    The terminal console shows the results pointing out any invalid identifier.

  3. Remove the statement from the SQL file and run the children's models via the dbt Power User icon.

  4. Keep switching between running the test to retrieve the ID of all business entities and running the children's models till you reach the expected results.

Delta runs

A powerful feature of dbt is the ability to work out which model definitions have changed since a specific baseline. The /dataops-cde/scripts/dataops_cde_setup.sh script, which runs every time a workspace is started creates this baseline and stores it in /workspace/state/. If you want to run a command (build/test will run the same way) for all the models that you have modified in some way since this baseline, you can simply run:

cd dataops/modelling/
dbt run --select state:modified+ --state /workspace/state/

Streamlit use cases

Rapid development of Streamlit app

This example assumes a Streamlit project at dataops/streamlit/houseprices

Streamlit itself supports dynamic reloading, so if your app is simply something like app.py then in a terminal just run:

streamlit run app.py

Which will run the Streamlit app and show a preview in the CDE:

file invalid name __shadow__

However, many customers will want to render Streamlit apps specific to different branches/environments, so a Streamlit app built in the dev branch will point to the <DATAOPS_PREFIX>_DEV database.

The simplest way of doing this is to develop your Streamlit app as a Jinja2 template e.g. app.template.py and then use the DataOps renderer to render this. You can add a build.sh script to your Streamlit app directory that looks something like this:

#!/bin/bash
# Ensure the environment variables and other environment-specific details are set
/dataops-cde/scripts/dataops_cde_init.sh

# Run the DataOps template renderer to render app.template.py to app.py
/home/gitpod/.pyenv/versions/3.8.13/bin/python /dataops-cde/scripts/dataops-render -o --file $STREAMLIT_FOLDER/houseprices/app.template.py render-template

# Tell the CDE to expect to be previewing a web application on port 8501
gp preview $(gp url 8501)

# Run the Streamlit app
streamlit run ${STREAMLIT_FOLDER}/houseprices/app.py $STREAMLIT_FOLDER/houseprices/app.template.py --server.port 8501 --server.allowRunOnSave true --server.runOnSave true --browser.gatherUsageStats False

You will need to change some small details like the folder name to make this simple script work.

Now when you run build.sh it will render and preview the Streamlit app:

file invalid name __shadow__

This means you can do some work on your Streamlit app, move to the terminal, Ctrl-C out of the build.sh, run it again and see your new app within seconds. Good, but we can do better. The problem here is that we are saving app.template.py and Streamlit is watching app.py for changes. We need something that detects every time app.template.py changes and reruns the render. For this you can create a simple watch.sh script:

#!/bin/bash
# This watches for changes to the app.template.py and when they are detected, reruns the build.sh
watchfiles "${STREAMLIT_FOLDER}/houseprices/build.sh" $STREAMLIT_FOLDER/houseprices/app.template.py --sigint-timeout 0 --sigkill-timeout 5

You can now run this instead of the build.sh (since it runs build.sh itself) and every time it detects a change to app.template.py it will rerun the build.sh which rerenders and previews:

Snowpark development use cases

Simple Data Frames example

The DataOps CDE is already fully pre-configured with all the main libraries and tools required to do Snowpark development and testing.

If you look at the simplest script for developing with Snowpark using Data Frames:

from snowflake.snowpark import Session
import os

connection_parameters = {
"account": os.environ['SNOWPARK_ACCOUNT'],
"user": os.environ['SNOWPARK_USER'],
"password": os.environ['SNOWPARK_PASSWORD'],
"role": os.environ['SNOWPARK_ROLE'],
"warehouse": os.environ['SNOWPARK_WAREHOUSE'],
"database": os.environ['SNOWPARK_DATABASE'],
"schema": os.environ['SNOWPARK_SCHEMA']
}


session = Session.builder.configs(connection_parameters).create()
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
df = df.filter(df.a > 1)
df.show()
pandas_df = df.to_pandas()
result = df.collect()

You can simply click the Play button in the top right-hand corner and the DataOps CDE will run this code locally, using Snowpark to execute the relevant pieces within Snowflake:

UDF/Stored Procedure example

To take a slightly more advanced example where you want to create a User Defined Function (UDF) or a Stored Procedure (SPROC) of your own design e.g.

from snowflake.snowpark import Session, GroupingSets
from snowflake.snowpark.functions import col,udf,sproc
from snowflake.snowpark.types import IntegerType
from snowflake.snowpark.functions import row_number
import os
import snowflake.snowpark
from snowflake.connector import connect

connection_parameters = {
"account": os.environ['SNOWPARK_ACCOUNT'],
"user": os.environ['SNOWPARK_USER'],
"password": os.environ['SNOWPARK_PASSWORD'],
"role": os.environ['SNOWPARK_ROLE'],
"warehouse": os.environ['SNOWPARK_WAREHOUSE'],
"database": os.environ['SNOWPARK_DATABASE'],
"schema": os.environ['SNOWPARK_SCHEMA']
}

session = Session.builder.configs(connection_parameters).create()

@sproc(name="udf_demo", replace=True, packages=["numpy","xgboost","snowflake-snowpark-python","scikit-learn"])

def my_copy(session: snowflake.snowpark.Session) -> str:
return session.get_current_account()

print("Running SPROC and UDF ...")
print(session.sql("call udf_demo()").collect())
print("Done SPROC and UDF ...")

And again use the Play icon to run this: