Skip to main content

How to Build only Changed MATE Models

When actively developing MATE models in a feature branch, especially with larger projects, it can be frustrating to have to wait for each pipeline build all the project's models.

This can be alleviated by adding a runner script to detect which models have changed in each pipeline run, adding this script to a tweaked version of the Build all Models job (or your project's equivalent thereof).

Implementation

  1. In your project, create a runner script to detect the model changes:

    runner-scripts/30-detect-model-changes.sh
    #!/usr/bin/env bash

    api_response=$(curl -s --header "PRIVATE-TOKEN: $DATAOPS_ACCESS_TOKEN" "https://app.dataops.live/api/v4/projects/$CI_PROJECT_ID/pipelines?ref=$CI_COMMIT_REF_NAME&status=success&order_by=id&sort=desc")
    if [[ -n "$DATAOPS_DEBUG" ]]; then echo "api_response: $api_response"; fi

    last_commit=$(jq -r '.[0].sha' <<< "$api_response")
    if [[ -n "$DATAOPS_DEBUG" ]]; then echo "last_commit: $last_commit"; fi

    echo "Grepping the diff against the last pipeline's commit for: $FIND_CHANGED_FILE"

    found_files=$(git diff --name-only $last_commit | grep -i "$FIND_CHANGED_FILE")
    if [[ -n "$DATAOPS_DEBUG" ]]; then echo "found_files: $found_files"; fi

    changed_models=""
    while IFS= read -r line; do
    file_name="${line##*/}"
    model_name="${file_name%.*}"
    changed_models="$changed_models $model_name"
    done <<< "$found_files"
    changed_models=$(echo "$changed_models" | xargs)
    if [[ -n "$DATAOPS_DEBUG" ]]; then echo "changed_models: $changed_models"; fi

    if [[ -n "$changed_models" ]]; then
    expose_key TRANSFORM_MODEL_SELECTOR "$changed_models"
    else
    expose_key TRANSFORM_MODEL_SELECTOR "_"
    fi
    if [[ -n "$DATAOPS_DEBUG" ]]; then echo "TRANSFORM_MODEL_SELECTOR: $TRANSFORM_MODEL_SELECTOR"; fi

    This script will find the commit SHA from your branch's previous pipeline and compute a diff of filenames between that and the current commit. This list of filenames is searched for a given path pattern, in this case it will be dataops/modelling/models/.*\.sql to look for any changed SQL models, and the list of model names is passed straight into TRANSFORM_MODEL_SELECTOR.

  2. Edit your existing Build all Models job (or equivalent) to ensure it will only run on non-feature branches (dev, qa and master).

    pipelines/includes/local_includes/modelling_and_transformation/build_all_models.yml
    Build all Models:
    extends:
    - .modelling_and_transformation_base
    - .agent_tag
    variables:
    TRANSFORM_ACTION: RUN
    stage: Data Transformation
    script:
    - /dataops
    icon: ${TRANSFORM_ICON}
    rules:
    - if: '$CI_COMMIT_REF_NAME == "master" || $CI_COMMIT_REF_NAME == "qa" || $CI_COMMIT_REF_NAME == "dev"'
  3. Add a new job as a copy of Build all Models, adding the following lines to the new job:

    pipelines/includes/local_includes/modelling_and_transformation/build_all_models.yml
    Build all Models:
    ...

    Build Changed Models ONLY:
    extends:
    - .modelling_and_transformation_base
    - .agent_tag
    variables:
    TRANSFORM_ACTION: RUN
    FIND_CHANGED_FILE: dataops/modelling/models/.*\.sql
    stage: Data Transformation
    script:
    - cp $CI_PROJECT_DIR/runner-scripts/30-detect-model-changes.sh /runner-scripts/
    - chmod +x /runner-scripts/30-detect-model-changes.sh
    - /dataops
    icon: ${TRANSFORM_ICON}
    rules:
    - if: '$CI_COMMIT_REF_NAME != "master" && $CI_COMMIT_REF_NAME != "qa" && $CI_COMMIT_REF_NAME != "dev"'
  4. In the same way, if you want tests to only run on changed models, also update MATE testing jobs with this logic.

  5. Update variables.yml to include a reference to your DataOps access token (needed for the API call in the runner script):

    pipelines/includes/config/variables.yml
    variables:
    ...
    DATAOPS_ACCESS_TOKEN: DATAOPS_VAULT(PATH.TO.DATAOPS_ACCESS_TOKEN)
    ...

When your pipeline runs in a feature branch, the new versions of the MATE jobs will run, only building/testing models that have changed since the previous pipeline's commit.