How to Detect if a File Has Changed
Sometimes, it may be necessary to consider triggering a specific action in a pipeline only if a particular file in the project has changed since the last pipeline run in that environment.
This kind of logic is a little at odds with the idempotent properties that DataOps pipelines are generally expected to exhibit, so only use this technique if there is a valid reason to do so.
A valid reason for this approach may be where a DataOps pipeline is triggering an external system/API that has usage limits or a per-action cost model. In that case, it would be cost-effective to only interact with that system if something has changed.
Limitations
Since this approach relies on computing the state of a file change within a job, it is impossible to use rules
logic to include/exclude jobs from a pipeline.
Implementation
-
In your project, create a runner script to detect the file change:
runner-scripts/30-detect-file-change#!/usr/bin/env bash
api_response=$(curl -s --header "PRIVATE-TOKEN: $ACCESS_TOKEN" "https://app.dataops.live/api/v4/projects/$CI_PROJECT_ID/pipelines?ref=$CI_COMMIT_REF_NAME&status=success&order_by=id&sort=desc")
if [[ -n "$DATAOPS_DEBUG" ]]; then echo "api_response: $api_response"; fi
last_commit=$(jq -r '.[0].sha' <<< "$api_response")
if [[ -n "$DATAOPS_DEBUG" ]]; then echo "last_commit: $last_commit"; fi
if git diff --name-only $last_commit | grep "$FIND_CHANGED_FILE"; then
echo "File $FIND_CHANGED_FILE HAS changed since the last commit, setting variable FILE_HAS_CHANGED"
expose_key FILE_HAS_CHANGED '1'
else
echo "File $FIND_CHANGED_FILE has NOT changed since the last commit"
fiThis uses variables
ACCESS_TOKEN
(DataOps user access token) andFIND_CHANGED_FILE
(file path to examine) and will set the variableFILE_HAS_CHANGED
if the specified file has changed since the last pipeline that ran in the same branch. -
Create another runner script (or adapt your existing runner script), using the
FILE_HAS_CHANGED
variable to decide whether to run the job's main activity or stop the script. Here is an example:runner-scripts/50-do-something#!/usr/bin/env bash
echo "Checking to see if the specified file has changed since the last commit"
if [[ -n "$FILE_HAS_CHANGED" ]]; then
echo "The file HAS changed, let's do this thing"
else
echo "No, the file did not change, we shall exit"
exit 0
fi
echo "Here is where we do this thing......."
You can alternatively use the non-zero exit code if you want to cause the job, and therefore the pipeline, to fail. Otherwise, the pipeline will continue after this job.
-
Create a job to run your scripts. Here is an example:
pipelines/includes/local_includes/sample_job.ymlDo Something If File Has Changed:
extends: .agent_tag
image: $DATAOPS_UTILS_RUNNER_IMAGE
stage: Data Transformation
variables:
ACCESS_TOKEN: DATAOPS_VAULT(MY.DATAOPS.ACCESS_TOKEN)
FIND_CHANGED_FILE: path/to/file.ext
script:
- cp $CI_PROJECT_DIR/runner-scripts/* /runner-scripts/
- /dataopsWhen this job runs, it will copy both runner scripts into the runner, running in sequence as part of the
/dataops
entry point.