Skip to main content

Building MATE Models

Jinja is used extensively in MATE for model documentation, macros, and packages. However, Jinja is not exclusive to MATE; it is used throughout the DataOps platform as a template-rendering tool, including other orchestrators than MATE Transform Orchestrator. For an overview of DataOps Template Rendering, see Template Rendering in the DataOps Concepts section.

Now that we understand template rendering in DataOps, including some of the Jinja basics, let's dive into how MATE uses Jinja.

Models form the foundation of MATE. The data cannot be modeled and transformed without these building blocks. Therefore, we recommend you understand how to build MATE models with and without Jinja.

Using SQL to build MATE models

As described in the Models section in the MATE Core Concepts doc, the SQL SELECT statements are the foundation of a SQL MATE model. The best way to describe this is by considering the following example:

Let's assume you need to build two staging tables as a foundation for a downstream dimension table.

The accepted definition of a staging table is an intermediate table that sits in the modeling and transformation DAG (Directed Acyclic Graph) between a source table and a final data product. It includes light-touch transformations shaping the data into how you want it to be, including changes such as renaming columns and reformatting the values in a particular column. These light-touch transformations clean and standardize the data before transforming it into the final product downstream.

This SQL code sample shows how to create a model called stg_orders:

stg_orders.sql
stg_orders as (
select
id as order_id,
prod_type_code as product_type
items as items_ordered,
date as order_date
from
raw.stock_orders
)

And this SQL code snippet shows how to create a model called stg_product_types:

stg_product_types.sql
stg_product_types as (
select
id as product_type_id,
code as product_type_code,
prod_type_desc as product_type_description
from
raw.product_types
)

You can then use these basic SQL models (stg_orders and stg_product_types) to build more complex downstream models.

For instance:

The following code references the stg_orders and stg_product_types models described above to build a model (called dim_total_orders_product_type) that returns a list of orders per product type.

dim_total_orders_product_type.sql
with product_types as (
select * from {{ ref('stg_product_types') }}
),

with orders as (
select * from {{ ref('stg_orders') }}
),

final as (
select product_types.product_type_code,
product_types.product_type_description,
coalesce(sum(orders.items_ordered), 0), as total_orders

from product_types
left join orders using (orders.product_type)
)

select * from final
note

Jinja is used to reference the stg_orders and stg_product_typesmodels in this MATE Model.

Feature release status badge: PriPrev
PriPrev
Using Python to build MATE models

As described in the Models section in the MATE Core Concepts doc, dbt Python models help you solve use cases that can't be solved with SQL. The best way to describe this is by considering the following example:

Python models participate fully in the dbt DAG (Directed Acyclic Graph) of transformations. Use the dbt.ref() method within a Python model to read data from other models (SQL or Python). If you want to read directly from a raw source table, use dbt.source(). These methods return DataFrames pointing to the upstream source, model, seed, or snapshot.

To benefit from this feature, you must enable the dbt 1.3 private preview in your branch by adding the following variable to the project variables:

pipelines/includes/config/variables.yml
variables:
DATAOPS_RUNNER_IMAGE_TAG: pripre-dbt1.3

The next pipeline on the branch will use the orchestrator version, which has the dbt 1.3 preview enabled.

This SQL code sample shows how to create a SQL model called stg_orders:

stg_orders.sql
stg_orders as (
select
id as order_id,
prod_type_code as product_type
items as items_ordered,
date as order_date
from
raw.stock_orders
)

You can then use the above basic SQL models (stg_orders) to build python models.

For instance:

The following code references the stg_orders model to build a model (called stg_product_types) that returns a limited list of orders for dev.

stg_product_types.py
def model(dbt, session):
target_name = dbt.config.get("target_name")
specific_var = dbt.config.get("specific_var")
specific_env_var = dbt.config.get("specific_env_var")

orders_df = dbt.ref("stg_orders")

# limit data in dev
if target_name == "dev":
orders_df = orders_df.limit(500)
)

Defining functions

In addition to defining a model function, the Python model can import other functions or define its own. Here's an example on Snowpark, defining a custom add_one function:

stg_product_types.py
def add_one(x):
return x + 1

def model(dbt, session):
dbt.config(materialized="table")
temps_df = dbt.ref("temperatures")

# warm things up just a little
df = temps_df.withColumn("degree_plus_one", add_one(temps_df["degree"]))
return df

Currently, Python functions defined in one dbt model can't be imported and reused in other models.

note

You can reference your Python model in downstream SQL models using ref().

To know more about the dbt 1.3 private preview, check out dbt-1.3 Preview.

Building MATE models with Jinja

Jinja templating is extremely useful for simplifying complex models, as the following code example shows:

The original SQL is as follows:

with product_types as (
select * from {{ ref('stg_product_type') }}
),

final_stock_levels as (
select
stock_id,

sum(case when product_type = 'toy_trains' then stock_count else 0 end) as toy_trains_stock_count,
sum(case when product_type = 'toy_cars' then stock_count else 0 end) as toy_cars_stock_count,
sum(case when product_type = 'toy_dolls' then stock_count else 0 end) as toy_dolls_stock_count,
sum(case when product_type = 'toy_airplanes' then stock_count else 0 end) as toy_airplanes_stock_count

from stock_levels

group by 1
)

select * from final_stock_levels

And the refactored Jinja and SQL model code is as follows, simplifying the code by increasing its readability and maintainability:

{%- set product_type_names = ['toy_trains','toy_cars','toy_dolls','toy_airplanes'] -%}

with product_types_tbl as (
select * from {{ ref('stg_product_type') }}
),

final_stock_levels as (
select
stock_id,

{% for product_type in product_types -%}

sum(case when product_type = '{{ product_type }}' then stock_count else 0 end)
as {{ product_type}}_stock_count

{%- if not loop.last -%}
,
{% endif -%}

{%- endfor %}

from stock_levels
group by 1
)

select * from final_stock_levels