Skip to main content

Key Developer Use Cases

Feature release status badge: PriPrev
PriPrev

There are many use cases that the DataOps Development Environment (DDE) optimizes, including transformation and automated testing using the DataOps Modeling and Transformation Engine (MATE). The subsequent sections will demo a few example use cases you can achieve with the development environment.

MATE use cases

info

Examples on this page are using the DDE DevReady, the DataOps Development Environment in your browser.

Switching dbt version in DDE

caution

As of October 2023 onward, the DDE will no longer support dbt 1.0. We strongly recommend upgrading to a later supported dbt version, such as 1.4 or 1.5, for better performance and access to new features.

The DDE uses dbt 1.0 by default, but you can change this version through the DataOps live VS Code extension. You can see the steps for this in the DDE DevReady and DDE DevPod walkthroughs.

If you have already set up the DDE, you can change the dbt version by clicking the DataOps icon in the left vertical bar and then clicking the DDE Setup Walkthrough button in the top left.

DDE open-walkthrough __shadow__

You can either choose from a list of DBT versions that come with the MATE packages or you can select a custom version of DBT in which case you only get the packages defined in your packages.yml file.

Setting dbt version for a single user or for all project users

The version of DBT used in both the DDE DevReady and DevPod is now stored in your VS Code settings, so you can now add this setting at the scopes provided in VS Code.

To set this for all users of a project you will want to save the DBT version setting at the workspace level, this is the default and will create a file called settings.json inside the .vscode folder in your project. To save this version so that all users of the project you can commit and push this file to your project repository.

To only set this for yourself you can save the DBT version setting at the user level, this will only affect your installation of VS Code, to do this you will need to manually set it instead of setting it through the walkthrough.

DDE open-walkthrough __shadow__

Above you can see the settings.json file with the version of DBT for the current project/workspace, this one takes precedence over any others and we can see that the same values is also set at the User and Remote Settings scope but only the workspace level setting will be used.

MATE packages in the DDE

The DDE will include MATE packages, but only for supported versions of dbt. You can still use MATE packages with an unsupported dbt version, but you must install them manually. For more information about switching dbt version in the MATE orchestrator, see Switching dbt Version.

Modifying an existing MATE model

This example shows how to do simple modifications to an existing DataOps model and preview the results automatically after entering each new SQL line. Let's assume you need to edit a product table to add new columns with specific names and populate them with content.

  1. Under your top-level Projects menu, click the project you want to open.

  2. From the project menu buttons, click DataOps DDE to open the development environment.

  3. Navigate to the project modeling SQL file and duplicate the statement about the column name.

    The column NAME is duplicated automatically.

  4. Edit the statements and name the first column productcategory and the second productsubcategory.

  5. Add the coalesce function to the column statements to fill the columns with Misc.

    The columns are automatically populated with the new text.

  6. Enter another SQL statement to add the new ListPrice column to the table.

Adding a test to an existing MATE model

This example shows how to add a test to a layer in an existing DataOps model and automatically preview the results of running the tests. Let's assume you want to test selecting the data for a specific customer.

  1. Under your top-level Projects menu, click the project you want to open.

  2. From the project menu buttons, click DataOps DDE to open the development environment.

  3. Navigate to the project YAML file and add tests to check for null values in specific columns in the customer table.

  4. Select dbt Power User on the left vertical bar and click the run button on the panel's top right corner.

    The terminal console shows the tests' results defined in the .yml file.

  5. Enter another test condition in the YAML file to test relationships to a field in another table and rerun the test.

Testing MATE model changes

This example shows how to check if the changes added to the DataOps model haven't broken anything downstream.

  1. Open the project modeling SQL file and add a test to retrieve the ID of all business entities.
  2. Run the test via the dbt Power User icon.

The terminal console shows the results pointing out any invalid identifier.

  1. Remove the statement from the SQL file and run the children's models via the dbt Power User icon.

  2. Keep switching between running the test to retrieve the ID of all business entities and running the children's models till you reach the expected results.

Delta runs

A powerful feature of dbt is the ability to work out which model definitions have changed since a specific baseline. The /dataops-cde/scripts/dataops_cde_setup.sh script, which runs every time a workspace is started creates this baseline and stores it in /workspace/state/. If you want to run a command (build/test will run the same way) for all the models that you have modified in some way since this baseline, you can run:

cd dataops/modelling/
dbt run --select state:modified+ --state /workspace/state/

Streamlit use cases

Rapid development of Streamlit app

This example assumes a Streamlit project at dataops/streamlit/houseprices.

Streamlit itself supports dynamic reloading, so if your app is simply something like app.py, then in a terminal, just run:

streamlit run app.py

This runs the Streamlit app and shows a preview in the DDE:

file invalid name __shadow__

However, many customers will want to render Streamlit apps specific to different branches/environments, so a Streamlit app built in the Dev branch will point to the <DATAOPS_PREFIX>_DEV database.

The simplest way of doing this is to develop your Streamlit app as a Jinja2 template, e.g., app.template.py, and then use the DataOps renderer to render this. You can add a build.sh script to your Streamlit app directory that looks something like this:

#!/bin/bash
# Ensure the environment variables and other environment-specific details are set
/dataops-cde/scripts/dataops_cde_init.sh

# Run the DataOps template renderer to render app.template.py to app.py
/home/gitpod/.pyenv/versions/3.8.13/bin/python /dataops-cde/scripts/dataops-render -o --file $STREAMLIT_FOLDER/houseprices/app.template.py render-template

# Tell the DDE DevReady to expect to be previewing a web application on port 8501
gp preview $(gp url 8501)

# Run the Streamlit app
streamlit run ${STREAMLIT_FOLDER}/houseprices/app.py $STREAMLIT_FOLDER/houseprices/app.template.py --server.port 8501 --server.allowRunOnSave true --server.runOnSave true --browser.gatherUsageStats False

You must change some minor details, like the folder name to make this simple script work.

Now when you run build.sh, it renders and previews the Streamlit app:

file invalid name __shadow__

This means you can do some work on your Streamlit app, move to the terminal, "Ctrl-C" out of the build.sh, rerun it, and see your new app within seconds.

Good, but we can do better. The problem here is that we are saving app.template.py, and Streamlit is watching app.py for changes. We need something that detects every time app.template.py changes and reruns the render. For this, you can create a simple watch.sh script:

#!/bin/bash
# This watches for changes to the app.template.py and when they are detected, reruns the build.sh
watchfiles "${STREAMLIT_FOLDER}/houseprices/build.sh" $STREAMLIT_FOLDER/houseprices/app.template.py --sigint-timeout 0 --sigkill-timeout 5

You can now run this instead of the build.sh, since it runs build.sh itself, and every time it detects a change to app.template.py, it reruns the build.sh, which rerenders and previews:

Snowpark use cases

Simple data frames example

The DDE is fully pre-configured with all the main libraries and tools required for Snowpark development and testing.

If you look at the most straightforward script for developing with Snowpark using Data Frames:

from snowflake.snowpark import Session
import os

connection_parameters = {
"account": os.environ['SNOWPARK_ACCOUNT'],
"user": os.environ['SNOWPARK_USER'],
"password": os.environ['SNOWPARK_PASSWORD'],
"role": os.environ['SNOWPARK_ROLE'],
"warehouse": os.environ['SNOWPARK_WAREHOUSE'],
"database": os.environ['SNOWPARK_DATABASE'],
"schema": os.environ['SNOWPARK_SCHEMA']
}


session = Session.builder.configs(connection_parameters).create()
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
df = df.filter(df.a > 1)
df.show()
pandas_df = df.to_pandas()
result = df.collect()

You can click the Play button in the top right-hand corner, and DDE DevReady will run this code locally, using Snowpark to execute the relevant pieces within Snowflake:

UDF/Stored procedure example

To take a slightly more advanced example where you want to create a User Defined Function (UDF) or a Stored Procedure (SPROC) of your own design, e.g.:

from snowflake.snowpark import Session, GroupingSets
from snowflake.snowpark.functions import col,udf,sproc
from snowflake.snowpark.types import IntegerType
from snowflake.snowpark.functions import row_number
import os
import snowflake.snowpark
from snowflake.connector import connect

connection_parameters = {
"account": os.environ['SNOWPARK_ACCOUNT'],
"user": os.environ['SNOWPARK_USER'],
"password": os.environ['SNOWPARK_PASSWORD'],
"role": os.environ['SNOWPARK_ROLE'],
"warehouse": os.environ['SNOWPARK_WAREHOUSE'],
"database": os.environ['SNOWPARK_DATABASE'],
"schema": os.environ['SNOWPARK_SCHEMA']
}

session = Session.builder.configs(connection_parameters).create()

@sproc(name="udf_demo", replace=True, packages=["numpy","xgboost","snowflake-snowpark-python","scikit-learn"])

def my_copy(session: snowflake.snowpark.Session) -> str:
return session.get_current_account()

print("Running SPROC and UDF ...")
print(session.sql("call udf_demo()").collect())
print("Done SPROC and UDF ...")

Again use the Play button to run this: