Skip to main content

MATE Automated Data Testing

The value of data testing in MATE lies in the ability to make sure that the models that we build meet our assertions. Not only does MATE contain a set of standard tests as well as the ability to write as many custom test sets as you require, but the data testing process is automated.

MATE also contains a rich library of macros and additional test sets to help you write your tests to ensure that your data models are thoroughly tested to ensure the production data models' accuracy, validity, and trustworthiness.

In practice, MATE automated testing takes a set of tests defined in .yml files under /dataops/modelling/ and executes them as part of a standard/default MATE job. The standard automated testing YAML config file is as follows:

data_testing:
extends:
- .modelling_and_transformation_base
- .agent_tag
variables:
TRANSFORM_ACTION: TEST
stage: "Business Domain Validation Checks"
script:
- /dataops
icon: ${TRANSFORM_ICON}

Generic Tests

Generic tests are specified in a YAML file and can be set at a column level.

Additionally, the standard generic tests shipped with dbt comprise the following four types:

  • Unique
  • Not null
  • Accepted values
  • Relationships

Unique Test

Unique tests determine whether the values in a specific column are unique or not. It is typically described as follows:

version: 2

models:
- name: stg_product_type
columns:
- name: product_type_id
tests:
- unique

When run, this test will check every value in the product_type_id column and return an error if there are any duplicate values in the stg_product_type model.

Not Null Test

Not null (not_null) tests are similar to the unique test described above, except that they test whether any values in the specified column are null or not. The test passes if there are no null values and fails if at least one null value is in this column.

For example:

version: 2

models:
- name: stg_product_type
columns:
- name: product_type_id
tests:
- not_null

Accepted Values Test

The accepted values (accepted_values) test determines whether the specified column values are equal to one of the values in the list that accompanies the test declaration. For instance, this code snippet shows that the accepted_values test will check all the values in the product_type column to ensure all the values are in the values list below.

version: 2

models:
- name: stg_product_type
columns:
- name: product_type
tests:
- accepted_values:
values:
- toy_trains
- toy_cars
- toy_dolls
- toy_airplanes

Relationships Test

The relationship test relationships tests the primary - foreign key referential integrity of values in a column.

Wikipedia defines referential integrity as follows:

Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute (either in the same or a different relation), then the referenced value must exist.

To test for referential integrity the test of the relationship is structured as follows:

version: 2

models:
- name: final_stock_levels
columns:
- name: product_type
tests:
- relationships:
to: ref('stg_product_type')
field: product_type

Singular Tests

Not only does MATE ship with [generic test types][(#generic-tests)], but it also includes the functionality to write singular (or stand-alone) tests. A singular test is a SQL query stored in a .sql file.

Let's assume you want to determine whether the stock levels in your stock level table are not negative. To test this determination, your SQL statement will need to look something like this:

select
stock_id
sum(stock_level) as total_stock_per_id
from {{ ref ('stock_levels')}}
group by stock_id
having not(total_stock_per_id < 0)

DataOps Package Tests

DataOps provides the package dataops adding tests for both model and column levels. The list includes:

Use them together with generic tests, and tests from bundled packages. Example:

calc_staff_member.yml
version: 2
models:
- name: calc_staff_member
tests:
# Check we have at least 250 rows
- dataops.minimum_rows:
rows: 250
# Check that each row in the Employees table/model made it into this one
- dataops.same_rows_as:
compare_model: ref('cur_employee')
columns:
- name: "JOB_TITLE"
tests:
# Job title should not be none
- not_null
# Expect at least 10 different job titles
- dataops.distinct_count_minimum: 10
minimum_count:
- name: "NAME"
tests:
# Name should not be none
- not_null
# Name should not be the same for everyone
- dbt_utils.not_constant

Note that:

  • the name needs to match the model name
  • the tests: next to the model name apply to the overall model (table or view)
  • the tests: under columns: apply to just those columns

DataOps Model-level Row Tests

The following row-level tests are used to test row counts

dataops.exact_rows

exact_rows tests that the exact number of rows in the given model (table, view) is as expected

  • parameter rows - number of rows

Usage:

version: 2
models:
- name: my_relation
tests:
# Check for exactly 250 rows
- dataops.exact_rows:
rows: 250

dataops.maximum_rows

maximum_rows tests that the maximum number of rows in a model does not exceed a given value

  • parameter rows - number of rows

Usage:

version: 2
models:
- name: my_relation
tests:
# Check for 250 or less rows
- dataops.maximum_rows:
rows: 250

dataops.minimum_rows

minimum_rows tests that the minimum number of rows in a model is never less than a given value

  • parameter rows - number of rows

Usage:

version: 2
models:
- name: my_relation
tests:
# Check for 250 or more rows
- dataops.minimum_rows:
rows: 250

DataOps Model-level Compare Tests

These tests compare the row count for a specific model against its reference model:

dataops.fewer_rows_than_or_equal

fewer_rows_than_or_equal tests that the number of rows in a model (table, view) must be less than or equal to a reference model

  • parameter compare_model - relation to another model or source

Usage:

version: 2
models:
- name: my_relation
tests:
- dataops.fewer_rows_than_or_equal:
compare_model: ref('other_relation')

dataops.fewer_rows_than

fewer_rows_thantests that the number of rows in a model (table, view) must be less than a reference model

  • parameter compare_model - relation to another model or source

Usage:

version: 2
models:
- name: my_relation
tests:
- dataops.fewer_rows_than:
compare_model: ref('other_relation')

dataops.same_rows_as

same_rows_as tests that the number of rows in a model (table, view) must be the same as a reference model

  • parameter compare_model - relation to another model or source

Usage:

version: 2
models:
- name: my_relation
tests:
- dataops.same_rows_as:
compare_model: ref('other_relation')

DataOps Column-level Tests

dataops.distinct_count_maximum

distinct_count_maximum tests the maximum number of distinct values in the specified column

  • parameter maximum_count - the count

Usage:

version: 2
models:
- name: my_relation
columns:
- name: COL_A
tests:
- dataops.distinct_count_maximum:
maximum_count: 250

dataops.distinct_count_minimum

distinct_count_minimum tests the minimum number of distinct values in the specified column

  • parameter minimum_count - the count

Usage:

version: 2
models:
- name: my_relation
columns:
- name: COL_A
tests:
- dataops.distinct_count_minimum:
minimum_count: 250

dataops.max_value_equals

max_value_equals tests the maximum value of a column to be equal to the parameter

  • parameter check_value - the exact value to match

Usage:

version: 2
models:
- name: my_relation
columns:
- name: COL_A
tests:
- dataops.max_value_equals:
check_value: 250

dataops.test_sql_condition

test_sql_condition tests a column against a given error condition (WHERE statement syntax)

  • parameter error_condition - SQL WHERE expression without the WHERE matching errors

Usage:

version: 2
models:
- name: my_relation
columns:
- name: COL_A
tests:
# match all distinct 3 character values as errors
- dataops.test_sql_condition:
error_condition: value_field like '___'

Tips

Combining Multiple Tests

It is important to note that you can specify more than one test for a particular column. The following code snippet demonstrates this:

version: 2

models:
- name: stg_product_type
columns:
- name: product_type_id
tests:
- unique
- not_null

When run, this script will first test whether all the values in the product_type_id column are unique, and then it will test whether all the values in the product_type_id column are not null.

Single Test YAML Files

MATE includes a single test definition .yml file that can cover many models; however, we recommend using a single .yml file for each model or SQL file. For example:

testing-yml-1 __shadow__