Skip to main content

How to Detect if a File Has Changed

Sometimes, it may be necessary to consider triggering a specific action in a pipeline only if a particular file in the project has changed since the last pipeline run in that environment.

remember

This kind of logic is a little at odds with the idempotent properties that DataOps pipelines are generally expected to exhibit, so only use this technique if there is a valid reason to do so.

A valid reason for this approach may be where a DataOps pipeline is triggering an external system/API that has usage limits or a per-action cost model. In that case, it would be cost-effective to only interact with that system if something has changed.

Limitations

Since this approach relies on computing the state of a file change within a job, it is impossible to use rules logic to include/exclude jobs from a pipeline.

Implementation

  1. In your project, create a runner script to detect the file change:

    runner-scripts/30-detect-file-change
    #!/usr/bin/env bash

    api_response=$(curl -s --header "PRIVATE-TOKEN: $ACCESS_TOKEN" "https://app.dataops.live/api/v4/projects/$CI_PROJECT_ID/pipelines?ref=$CI_COMMIT_REF_NAME&status=success&order_by=id&sort=desc")
    if [[ -n "$DATAOPS_DEBUG" ]]; then echo "api_response: $api_response"; fi

    last_commit=$(jq -r '.[0].sha' <<< "$api_response")
    if [[ -n "$DATAOPS_DEBUG" ]]; then echo "last_commit: $last_commit"; fi

    if git diff --name-only $last_commit | grep "$FIND_CHANGED_FILE"; then
    echo "File $FIND_CHANGED_FILE HAS changed since the last commit, setting variable FILE_HAS_CHANGED"
    expose_key FILE_HAS_CHANGED '1'
    else
    echo "File $FIND_CHANGED_FILE has NOT changed since the last commit"
    fi

    This uses variables ACCESS_TOKEN (DataOps user access token) and FIND_CHANGED_FILE (file path to examine) and will set the variable FILE_HAS_CHANGED if the specified file has changed since the last pipeline that ran in the same branch.

  2. Create another runner script (or adapt your existing runner script), using the FILE_HAS_CHANGED variable to decide whether to run the job's main activity or stop the script. Here is an example:

    runner-scripts/50-do-something
    #!/usr/bin/env bash

    echo "Checking to see if the specified file has changed since the last commit"
    if [[ -n "$FILE_HAS_CHANGED" ]]; then
    echo "The file HAS changed, let's do this thing"
    else
    echo "No, the file did not change, we shall exit"
    exit 0
    fi

    echo "Here is where we do this thing......."

You can alternatively use the non-zero exit code if you want to cause the job, and therefore the pipeline, to fail. Otherwise, the pipeline will continue after this job.

  1. Create a job to run your scripts. Here is an example:

    pipelines/includes/local_includes/sample_job.yml
    Do Something If File Has Changed:
    extends: .agent_tag
    image: $DATAOPS_UTILS_RUNNER_IMAGE
    stage: Data Transformation
    variables:
    ACCESS_TOKEN: DATAOPS_VAULT(MY.DATAOPS.ACCESS_TOKEN)
    FIND_CHANGED_FILE: path/to/file.ext
    script:
    - cp $CI_PROJECT_DIR/runner-scripts/* /runner-scripts/
    - /dataops

    When this job runs, it will copy both runner scripts into the runner, running in sequence as part of the /dataops entry point.