Key Developer Use Cases
There are many use cases that the DataOps Development Environment (DDE) optimizes, including transformation and automated testing using MATE. The subsequent sections will demo a few example use cases you can achieve with the development environment.
MATE use cases
Examples on this page are using the DataOps Development Environment with cloud deployment, DDE Cloud.
Modifying an existing MATE model
This example shows how to do simple modifications to an existing DataOps model and preview the results automatically after entering each new SQL line. Let's assume you need to edit a product table to add new columns with specific names and populate them with content.
Under your top-level Projects menu, click the project you want to open.
From the project menu buttons, click DataOps CDE to open the development environment.
Navigate to the project Modelling SQL file and duplicate the statement about the column name.
The column NAME is duplicated automatically.
Edit the statements and name the first column
productcategory
and the secondproductsubcategory
.Add the
coalesce
function to the column statements to fill the columns withMisc
.The columns are automatically populated with the new text.
Enter another SQL statement to add the new
ListPrice
column to the table.
Adding a test to an existing MATE model
This example shows how to add a test to a layer in an existing DataOps model and automatically preview the results of running the tests. Let's assume you want to test selecting the data for a specific customer.
Under your top-level Projects menu, click the project you want to open.
From the project menu buttons, click DataOps CDE to open the development environment.
Navigate to the project YAML file and add tests to check for null values in specific columns in the customer table.
Select dbt Power User on the left vertical bar and click the run button on the top right corner of the panel.
The terminal console shows the results of all the tests defined in the .yml file.
Enter another test condition in the YAML file to test relationships to a field in another table and rerun the test.
Testing MATE model changes
This example shows how to check if the changes added to the DataOps model haven't broken anything downstream.
Open the project modeling SQL file and add a test to retrieve the ID of all business entities.
Run the test via the dbt Power User icon.
The terminal console shows the results pointing out any invalid identifier.
Remove the statement from the SQL file and run the children's models via the dbt Power User icon.
Keep switching between running the test to retrieve the ID of all business entities and running the children's models till you reach the expected results.
Delta runs
A powerful feature of dbt is the ability to work out which model definitions have changed since a specific baseline. The /dataops-cde/scripts/dataops_cde_setup.sh
script, which runs every time a workspace is started creates this baseline and stores it in /workspace/state/
. If you want to run a command (build/test will run the same way) for all the models that you have modified in some way since this baseline, you can run:
cd dataops/modelling/
dbt run --select state:modified+ --state /workspace/state/
Streamlit use cases
Rapid development of Streamlit app
This example assumes a Streamlit project at dataops/streamlit/houseprices
.
Streamlit itself supports dynamic reloading, so if your app is simply something like app.py, then in a terminal, just run:
streamlit run app.py
This runs the Streamlit app and shows a preview in the DDE Cloud:
However, many customers will want to render Streamlit apps specific to different branches/environments, so a Streamlit app built in the dev
branch will point to the <DATAOPS_PREFIX>_DEV
database.
The simplest way of doing this is to develop your Streamlit app as a Jinja2 template, e.g., app.template.py
, and then use the DataOps renderer to render this. You can add a build.sh
script to your Streamlit app directory that looks something like this:
#!/bin/bash
# Ensure the environment variables and other environment-specific details are set
/dataops-cde/scripts/dataops_cde_init.sh
# Run the DataOps template renderer to render app.template.py to app.py
/home/gitpod/.pyenv/versions/3.8.13/bin/python /dataops-cde/scripts/dataops-render -o --file $STREAMLIT_FOLDER/houseprices/app.template.py render-template
# Tell the DDE Cloud to expect to be previewing a web application on port 8501
gp preview $(gp url 8501)
# Run the Streamlit app
streamlit run ${STREAMLIT_FOLDER}/houseprices/app.py $STREAMLIT_FOLDER/houseprices/app.template.py --server.port 8501 --server.allowRunOnSave true --server.runOnSave true --browser.gatherUsageStats False
You must change some minor details, like the folder name to make this simple script work.
Now when you run build.sh
it will render and preview the Streamlit app:
This means you can do some work on your Streamlit app, move to the terminal, Ctrl-C out of the build.sh
, rerun it, and see your new app within seconds. Good, but we can do better. The problem here is that we are saving app.template.py
, and Streamlit is watching app.py
for changes. We need something that detects every time app.template.py
changes and reruns the render. For this, you can create a simple watch.sh
script:
#!/bin/bash
# This watches for changes to the app.template.py and when they are detected, reruns the build.sh
watchfiles "${STREAMLIT_FOLDER}/houseprices/build.sh" $STREAMLIT_FOLDER/houseprices/app.template.py --sigint-timeout 0 --sigkill-timeout 5
You can now run this instead of the build.sh
(since it runs build.sh
itself), and every time it detects a change to app.template.py
, it reruns the build.sh
, which rerenders and previews:
Snowpark development use cases
Simple data frames example
The DDE Cloud is already fully pre-configured with all the main libraries and tools required for Snowpark development and testing.
If you look at the simplest script for developing with Snowpark using Data Frames:
from snowflake.snowpark import Session
import os
connection_parameters = {
"account": os.environ['SNOWPARK_ACCOUNT'],
"user": os.environ['SNOWPARK_USER'],
"password": os.environ['SNOWPARK_PASSWORD'],
"role": os.environ['SNOWPARK_ROLE'],
"warehouse": os.environ['SNOWPARK_WAREHOUSE'],
"database": os.environ['SNOWPARK_DATABASE'],
"schema": os.environ['SNOWPARK_SCHEMA']
}
session = Session.builder.configs(connection_parameters).create()
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
df = df.filter(df.a > 1)
df.show()
pandas_df = df.to_pandas()
result = df.collect()
You can simply click the Play button in the top right-hand corner, and the DDE Cloud will run this code locally, using Snowpark to execute the relevant pieces within Snowflake:
UDF/Stored procedure example
To take a slightly more advanced example where you want to create a User Defined Function (UDF) or a Stored Procedure (SPROC) of your own design, e.g.:
from snowflake.snowpark import Session, GroupingSets
from snowflake.snowpark.functions import col,udf,sproc
from snowflake.snowpark.types import IntegerType
from snowflake.snowpark.functions import row_number
import os
import snowflake.snowpark
from snowflake.connector import connect
connection_parameters = {
"account": os.environ['SNOWPARK_ACCOUNT'],
"user": os.environ['SNOWPARK_USER'],
"password": os.environ['SNOWPARK_PASSWORD'],
"role": os.environ['SNOWPARK_ROLE'],
"warehouse": os.environ['SNOWPARK_WAREHOUSE'],
"database": os.environ['SNOWPARK_DATABASE'],
"schema": os.environ['SNOWPARK_SCHEMA']
}
session = Session.builder.configs(connection_parameters).create()
@sproc(name="udf_demo", replace=True, packages=["numpy","xgboost","snowflake-snowpark-python","scikit-learn"])
def my_copy(session: snowflake.snowpark.Session) -> str:
return session.get_current_account()
print("Running SPROC and UDF ...")
print(session.sql("call udf_demo()").collect())
print("Done SPROC and UDF ...")
Again use the Play button to run this: