Skip to main content

Testing Approaches

Unlike software testing, in data testing, there are 2 fundamentally different types of testing. When any type of testing is performed, the first consideration is what is being tested?

Let's consider a trivial case where we want a function that takes in a currency conversion rate and amount and converts the amount to a second currency. There are two different things to test here, the functional logic, i.e., does our currency converter work, and the second piece is the actual data to convert.

Unit testing: Testing the functional logic

In unit testing, we need to test that our piece of code is doing what we expect it to do in various scenarios and edge cases. This is identical to the software concept of unit testing.

When does this make sense to perform? In general, we perform unit tests before we use the code live. Since our production pipelines are moving data and therefore using this functionality, and our qa branch is intended to mirror production, we should not run unit tests in these branches. These branches should be able to assume that the functional logic is working and focus on the data as the thing under test.

See DataOps Unit Testing for complete details.

Data quality testing: Testing the data itself

In data quality testing, we need to test the actual data flowing as part of a pipeline and ensure that it meets key bounds/limitations/constraints etc. There isn't a software analogy for this.

When does this make sense to perform? Since our production pipelines are actually moving the data, and therefore using this functionality, and our qa branch is intended to mirror production these are the branches in which doing Data Quality Testing make sense.

Data quality testing is a broad area and there are a variety of different approaches. Most DataOps users do use the built-in Data Quality capabilities provided by MATE. However there other excellent approaches provided by tools like Montecarlo and Soda that can be included as full orchestrated parts of a DataOps pipeline.

Summary

We can therefore summarize a standard DataOps approach to unit testing vs data quality testing as:

test-types !!shadow!!