How to Build Only Changed MATE Models
When actively developing MATE models in a feature branch, especially with larger projects, it can be frustrating to have to wait for each pipeline to build all the project's models.
This can be alleviated by adding a runner script to detect which models have changed in each pipeline run, adding this script to a tweaked version of the Build all Models job (or your project's equivalent thereof).
-
In your project, create a runner script to detect the model changes:
runner-scripts/30-detect-model-changes.sh#!/usr/bin/env bash
api_response=$(curl -s --header "PRIVATE-TOKEN: $DATAOPS_ACCESS_TOKEN" "https://app.dataops.live/api/v4/projects/$CI_PROJECT_ID/pipelines?ref=$CI_COMMIT_REF_NAME&status=success&order_by=id&sort=desc")
if [[ -n "$DATAOPS_DEBUG" ]]; then echo "api_response: $api_response"; fi
last_commit=$(jq -r '.[0].sha' <<< "$api_response")
if [[ -n "$DATAOPS_DEBUG" ]]; then echo "last_commit: $last_commit"; fi
echo "Grepping the diff against the last pipeline's commit for: $FIND_CHANGED_FILE"
found_files=$(git diff --name-only $last_commit | grep -i "$FIND_CHANGED_FILE")
if [[ -n "$DATAOPS_DEBUG" ]]; then echo "found_files: $found_files"; fi
changed_models=""
while IFS= read -r line; do
file_name="${line##*/}"
model_name="${file_name%.*}"
changed_models="$changed_models $model_name"
done <<< "$found_files"
changed_models=$(echo "$changed_models" | xargs)
if [[ -n "$DATAOPS_DEBUG" ]]; then echo "changed_models: $changed_models"; fi
if [[ -n "$changed_models" ]]; then
expose_key TRANSFORM_MODEL_SELECTOR "$changed_models"
else
expose_key TRANSFORM_MODEL_SELECTOR "_"
fi
if [[ -n "$DATAOPS_DEBUG" ]]; then echo "TRANSFORM_MODEL_SELECTOR: $TRANSFORM_MODEL_SELECTOR"; fiThis script will find the commit SHA from your branch's previous pipeline and compute a diff of filenames between that and the current commit. This list of filenames is searched for a given path pattern, in this case, it will be
dataops/modelling/models/.*\.sql
to look for any change SQL models and the list of model names are passed straight intoTRANSFORM_MODEL_SELECTOR
. -
Edit your existing Build all Models job (or equivalent) to ensure it will only run on non-feature branches (dev, qa, and main).
pipelines/includes/local_includes/modelling_and_transformation/build_all_models.ymlBuild all Models:
extends:
- .modelling_and_transformation_base
- .agent_tag
variables:
TRANSFORM_ACTION: RUN
stage: Data Transformation
script:
- /dataops
icon: ${TRANSFORM_ICON}
rules:
- if: '$CI_COMMIT_REF_NAME == "main" || $CI_COMMIT_REF_NAME == "qa" || $CI_COMMIT_REF_NAME == "dev"' -
Add a new job as a copy of Build all Models, adding the following lines to the new job:
pipelines/includes/local_includes/modelling_and_transformation/build_all_models.ymlBuild all Models: ...
Build Changed Models ONLY:
extends:
- .modelling_and_transformation_base
- .agent_tag
variables:
TRANSFORM_ACTION: RUN
FIND_CHANGED_FILE: dataops/modelling/models/.*\.sql
stage: Data Transformation
script:
- cp $CI_PROJECT_DIR/runner-scripts/30-detect-model-changes.sh /runner-scripts/
- chmod +x /runner-scripts/30-detect-model-changes.sh
- /dataops
icon: ${TRANSFORM_ICON}
rules:
- if: '$CI_COMMIT_REF_NAME != "main" && $CI_COMMIT_REF_NAME != "qa" && $CI_COMMIT_REF_NAME != "dev"' -
In the same way, if you want tests to only run on changed models, also update MATE testing jobs with this logic.
-
Update variables.yml to include a reference to your DataOps access token (needed for the API call in the runner script):
pipelines/includes/config/variables.ymlvariables:
...
DATAOPS_ACCESS_TOKEN: DATAOPS_VAULT(PATH.TO.DATAOPS_ACCESS_TOKEN)
...
When your pipeline runs in a feature branch, the new versions of the MATE jobs will run, only building/testing models that have changed since the previous pipeline's commit.