Skip to main content

Testing Approaches

Unlike in Software Testing, in Data Testing there are 2 fundamentally different types of testing. When any type of testing is performed, the first consideration is what is being tested?

Let's consider a trivial case where we want a function that takes in a currency conversion rate and amount, and converts the amount to a second currency. There are two different things to test here, the functional logic i.e. does our currency converter work and the second piece is the actual data to convert.

Unit Testing: Testing the Functional Logic

In Unit Testing, we need to test that our piece of code is doing what we expect it to do in a range of different scenarios and edge cases. This is essentially identical to the software concept of unit testing.

When does this make sense to perform? In general, we perform unit tests before we use the code live. Since our production pipelines are actually moving data, and therefore using this functionality, and our qa branch is intended to mirror production we should not be running Unit Tests in these branches. These branches should be able to assume that the functional logic is working and focus on the data itself as the thing under test.

Unit testing in DataOps is fully explained here.

Data Quality Testing: Testing the Data Itself

In Data Quality Testing we need to test the actual data flowing as part of a pipeline and ensure that it meets key bounds/limitations/constraints etc. There isn't a software analogy for this.

When does this make sense to perform? Since our production pipelines are actually moving the data, and therefore using this functionality, and our qa branch is intended to mirror production these are the branches in which doing Data Quality Testing make sense.

Data Quality Testing is a broad area and there are a variety of different approaches. Most DataOps users do use the built in Data Quality capabilities provided by MATE. However there other excellent approaches provided by tools like Montecarlo and Soda that can be included as full orchestrated parts of a DataOps pipeline.

Summary

We can therefore summarize a standard DataOps approach to Unit Testing vs Data Quality Testing as:

test-types __shadow__