How to Automatically Retry a Failed Job
Sometimes, you may want to automatically retry a failed job until it succeeds or reaches a specific number of retries. This could be the case when you know you frequently have temporary issues with resources your job depends on. If you know that these issues automatically heal themselves after a certain amount of time, consider using automatic retries for your jobs as well. By doing so, you no longer have to manually retry a failed job as part of a DataOps pipeline and the pipeline can heal itself.
Use the retry
keyword to configure how many times you want to reprocess a failed job. Values you can set are 0
, 1
or 2
. If the value isn't defined, it defaults to 0
. Here is an example job:
Test all Sources:
extends:
- .modelling_and_transformation_base
- .agent_tag
variables:
TRANSFORM_ACTION: TEST
TRANSFORM_MODEL_SELECTOR: source:*
stage: Source Testing
script:
- /dataops
retry: 2
icon: ${TESTING_ICON}
artifacts:
when: always
reports:
junit: $CI_PROJECT_DIR/report.xml
Be careful not to overuse the retry
keyword. If your pipeline fails often, it is best to debug and rethink its logic rather than prolonged runs with 1 or 2 retries.