Building MATE Models
Jinja is used extensively in MATE for model documentation, macros, and packages. However, Jinja is not exclusive to MATE; it is used throughout the DataOps.live data product platform as a template-rendering tool, including other orchestrators than MATE Transform Orchestrator. For an overview of DataOps Template Rendering, see Template Rendering in the DataOps Concepts section.
Now that we understand template rendering in DataOps, including some of the Jinja basics, let's dive into how MATE uses Jinja.
Models form the foundation of MATE. The data cannot be modeled and transformed without these building blocks. Therefore, we recommend you understand how to build MATE models with and without Jinja.
Using SQL to build MATE models
As described in the Models section in the MATE Core Concepts doc, the SQL SELECT statements are the foundation of a SQL MATE model. The best way to describe this is by considering the following example:
Let's assume you need to build two staging tables as a foundation for a downstream dimension table.
The accepted definition of a staging table is an intermediate table that sits in the modeling and transformation DAG (Directed Acyclic Graph) between a source table and a final data product. It includes light-touch transformations shaping the data into how you want it to be, including changes such as renaming columns and reformatting the values in a particular column. These light-touch transformations clean and standardize the data before transforming it into the final product downstream.
This SQL code sample shows how to create a model called stg_orders
:
stg_orders as (
select
id as order_id,
prod_type_code as product_type
items as items_ordered,
date as order_date
from
raw.stock_orders
)
And this SQL code snippet shows how to create a model called stg_product_types
:
stg_product_types as (
select
id as product_type_id,
code as product_type_code,
prod_type_desc as product_type_description
from
raw.product_types
)
To build more complex downstream models, you can then use these basic SQL models (stg_orders
and stg_product_types
).
For instance:
The following code references the stg_orders
and stg_product_types
models described above to build a model (called dim_total_orders_product_type
) that returns a list of orders per product type.
with product_types as (
select * from {{ ref('stg_product_types') }}
),
with orders as (
select * from {{ ref('stg_orders') }}
),
final as (
select product_types.product_type_code,
product_types.product_type_description,
coalesce(sum(orders.items_ordered), 0), as total_orders
from product_types
left join orders using (orders.product_type)
)
select * from final
Jinja is used to reference the stg_orders
and stg_product_types
models in this MATE Model.
Using Python to build MATE models
Prerequisite (dbt 1.4 or later)
As described in the Models section in the MATE Core Concepts documentation, dbt Python models help you solve use cases that can't be solved with SQL. The best way to describe this is by considering the following example:
Python models participate fully in the dbt DAG (Directed Acyclic Graph) of transformations. Use the dbt.ref()
method within a Python model to read data from other models (SQL or Python). If you want to read directly from a raw source table, use dbt.source()
. These methods return DataFrames pointing to the upstream source, model, seed, or snapshot.
Make sure you have set the dbt version with which to use MATE. See Switching dbt versions.
This SQL code sample shows how to create a SQL model called stg_orders
:
stg_orders as (
select
id as order_id,
prod_type_code as product_type
items as items_ordered,
date as order_date
from
raw.stock_orders
)
You can then build python models using the above basic SQL models (stg_orders
).
For instance:
The following code references the stg_orders
model to build a model (called stg_product_types
) that returns a limited list of orders for dev.
def model(dbt, session):
target_name = dbt.config.get("target_name")
specific_var = dbt.config.get("specific_var")
specific_env_var = dbt.config.get("specific_env_var")
orders_df = dbt.ref("stg_orders")
# limit data in dev
if target_name == "dev":
orders_df = orders_df.limit(500)
)
Defining functions
In addition to defining a model function, the Python model can import other functions or define its own. Here's an example on Snowpark, defining a custom add_one
function:
def add_one(x):
return x + 1
def model(dbt, session):
dbt.config(materialized="table")
temps_df = dbt.ref("temperatures")
# warm things up just a little
df = temps_df.withColumn("degree_plus_one", add_one(temps_df["degree"]))
return df
Currently, Python functions defined in one dbt model can't be imported and reused in other models.
You can reference your Python model in downstream SQL models using ref()
.
Building MATE models with Jinja
Jinja templating is extremely useful for simplifying complex models, as the following code example shows:
The original SQL is as follows:
with product_types as (
select * from {{ ref('stg_product_type') }}
),
final_stock_levels as (
select
stock_id,
sum(case when product_type = 'toy_trains' then stock_count else 0 end) as toy_trains_stock_count,
sum(case when product_type = 'toy_cars' then stock_count else 0 end) as toy_cars_stock_count,
sum(case when product_type = 'toy_dolls' then stock_count else 0 end) as toy_dolls_stock_count,
sum(case when product_type = 'toy_airplanes' then stock_count else 0 end) as toy_airplanes_stock_count
from stock_levels
group by 1
)
select * from final_stock_levels
And the refactored Jinja and SQL model code is as follows, simplifying the code by increasing its readability and maintainability:
{%- set product_type_names = ['toy_trains','toy_cars','toy_dolls','toy_airplanes'] -%}
with product_types_tbl as (
select * from {{ ref('stg_product_type') }}
),
final_stock_levels as (
select
stock_id,
{% for product_type in product_types -%}
sum(case when product_type = '{{ product_type }}' then stock_count else 0 end)
as {{ product_type}}_stock_count
{%- if not loop.last -%}
,
{% endif -%}
{%- endfor %}
from stock_levels
group by 1
)
select * from final_stock_levels
Auto-Generating model definitions
DataOps.live also makes available a convenient script — model-builder
— that automates generating model definitions.
The model-builder
script is provided as-is and without warranty. Although it is not a part of the data product platform,
it has been well tested, and our Support team will be happy to help if they can.
Prerequisites
To run the model-builder
script, you need to:
- Have a working Python 3.6+ environment — it has only been fully tested on Ubuntu Linux so far (Windows should work, but your mileage may vary)
- Install dbt locally, or at least create a
profiles.yml
as the script uses a local dbtprofiles.yml
file as the source of connection and authentication details
You can download and install the sources from the Model Builder project on DataOps — check the README for instructions.
To best use this script, you may need to have your DataOps project cloned locally (using Git over HTTPS).
Usage
The model-builder
script does not need to connect to Snowflake but will
instead, scan a local set of dbt source definitions and scaffold a set of basic SQL models.
usage: model-builder [-h] [-v] {from-sources,from-json} ...
Build dbt models from dbt source definitions.
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
subcommands:
{from-sources,from-json}
from-sources Build models from existing dbt source definitions
from-json Build model from JSON document