How to Detect if a File has Changed
Sometimes, it may be necessary to consider triggering a certain action in a pipeline only if a particular file in the project has changed since the last pipeline ran in that environment.
This kind of logic is a little at odds with the idempotent properties that DataOps pipelines are generally expected to exhibit, so only use this technique if there is a valid reason to do so.
A valid reason for this approach may be where a DataOps pipeline is triggering an external system/API that has usage limits or a per-action cost model. In that case, it would be cost-effective to only interact with that system if something has changed.
Since this approach relies on computing the state of a file change within a job, it is therefore not
possible to use
rules logic to include/exclude jobs from a pipeline.
In your project, create a runner script to detect the file change:runner-scripts/30-detect-file-change
api_response=$(curl -s --header "PRIVATE-TOKEN: $ACCESS_TOKEN" "https://app.dataops.live/api/v4/projects/$CI_PROJECT_ID/pipelines?ref=$CI_COMMIT_REF_NAME&status=success&order_by=id&sort=desc")
if [[ -n "$DATAOPS_DEBUG" ]]; then echo "api_response: $api_response"; fi
last_commit=$(jq -r '..sha' <<< "$api_response")
if [[ -n "$DATAOPS_DEBUG" ]]; then echo "last_commit: $last_commit"; fi
if git diff --name-only $last_commit | grep "$FIND_CHANGED_FILE"; then
echo "File $FIND_CHANGED_FILE HAS changed since the last commit, setting variable FILE_HAS_CHANGED"
expose_key FILE_HAS_CHANGED '1'
echo "File $FIND_CHANGED_FILE has NOT changed since the last commit"
This uses variables
ACCESS_TOKEN(DataOps user access token) and
FIND_CHANGED_FILE(filepath to examine) and will set the variable
FILE_HAS_CHANGEDif the specified file has changed since the last pipeline that ran in the same branch.
Create another runner script (or adapt your existing runner script), using the
FILE_HAS_CHANGEDvariable to decide whether to run the job's main activity or to stop the script. Here is an example:runner-scripts/50-do-something
echo "Checking to see if the specified file has changed since the last commit"
if [[ -n "$FILE_HAS_CHANGED" ]]; then
echo "The file HAS changed, let's do this thing"
echo "No, the file did not change, we shall exit"
echo "Here is where we do this thing......."
The non-zero exit code can alternatively be used if you want to cause the job (and therefore the pipeline) to fail. Otherwise, the pipeline will continue after this job.
Create a job to run your scripts. Here is an example:pipelines/includes/local_includes/sample_job.yml
Do Something If File Has Changed:
stage: Data Transformation
- cp $CI_PROJECT_DIR/runner-scripts/* /runner-scripts/
When this job runs, it will copy both runner scripts into the runner, so they will run in sequence as part of the