Job Artifacts
What are job artifacts?
Job artifacts capture the results of jobs that run in a pipeline. Artifacts are a list of files and directories that are generated by the job execution and are passed between jobs.
Artifacts in the DataOps.live ecosystems are used in the following ways:
- They store the as-is state of Snowflake after running the Snowflake Object Lifecycle Engine
- They persist the Snowflake database schema documentation, data lineage, and test results from the Modelling and Transformation Engine
- They represent data quality test results
- They capture DataOps orchestrator output for reporting purposes
All job artifacts are accessible from the UI at the end of the pipeline run.
Jobs generate artifacts
Assume you are building a custom job that extends the data product platform. You can do this for example based on the Utils Orchestrator:
Custom Job:
extends:
- .agent_tag
image: $DATAOPS_UTILS_RUNNER_IMAGE
variables:
VAR1: value
stage: Additional Configuration
script:
- echo "placeholder for some actual work"
- mkdir -p ${CI_PROJECT_DIR}/job_output
- echo "line 1 of summary" > ${CI_PROJECT_DIR}/job_output/summary.txt
artifacts:
name: My custom artifact
when: always
paths:
- ${CI_PROJECT_DIR}/job_output/
icon: ${UTIL_ICON}
The artifacts
keyword section is new and is required to capture non-log output from this job as part of a pipeline. The details are as follows:
name
- the name for this job's list of artifactspaths
- the list of files or directories constituting the artifactswhen
-always
indicates that the job must generate this artifact every time the pipeline runs. Ifwhen
is omitted, the artifacts are just stored on job success
Finding artifacts
You can access all artifacts of all jobs in a pipeline from the pipelines page. You can access the artifacts of an individual job from either:
-
The job summary page (navigate to CI/CD → Jobs)
-
The job execution detail page
Creating artifacts
Let us look at how to create artifacts in more detail. To use them, you must add the artifacts
keyword to your <pipeline>-ci.yml
file to leverage job (and pipeline) artifacts. For instance, the following YAML config file shows how the artifacts are set up:
my reporting job:
artifacts:
name: My Job Report
when: always
paths:
- /my_job_report
- run_result.log
expires_in: 1 week
In this example, the following details are relevant:
- A job called
my reporting job
runs and generates an artifact calledMy Job Report
- The
paths
keyword determines which directories and files to add to the job artifacts - The
when: always
keywords indicate that the job must generate this artifact every time the pipeline runs. If when is omitted the artifacts are just stored on job success - The
expires_in
keyword determines how long these artifacts are kept before being marked for deletion
If you run two jobs concurrently in a single pipeline stage, the job that finishes last creates the artifact files.
If you want to disable artifact passing, define the job with empty dependencies as follows:
# set up artifact details
my reporting job:
stage: build
script: make build
dependencies: []
If you want to create artifacts only for a given branch, use rules to build these artifacts, like the following example:
my reporting job:
artifacts:
name: My Job Report
paths:
- /my_job_report
rules:
- if: $CI_COMMIT_BRANCH == 'production'
In this scenario, the variable $CI_COMMIT_BRANCH
is set to production. Therefore, this job will only run, and these artifacts will only be generated when this pipeline executes in the context of production.
Using pipeline variables with artifacts
It is also possible to use CI pipeline variables to dynamically define several of the details found in the artifacts
section of the pipeline-ci.yml
configuration file. Here are some simple examples:
The following code snippet uses the ${CI_JOB_ID}
as the dynamic artifact name
:
my reporting job:
artifacts:
name: ${CI_JOB_ID}
paths:
- /my_job_report
This example shows how the ${CI_PIPELINE_ID}
plus the ${CI_JOB_ID}
make up the artifact's paths
keyword:
my reporting job:
artifacts:
name: ${CI_JOB_ID}
paths:
- /${CI_PIPELINE_ID}/${CI_JOB_ID}
Excluding files from artifacts
It is possible to exclude specific files from being added to an artifact. For instance, let's assume we want to exclude all template.html
files from an archive. To achieve this, all we do is add the exclude
keyword to our pipeline YAML file as follows:
my reporting job:
artifacts:
name: ${CI_JOB_ID}
paths:
- /${CI_PIPELINE_ID}/${CI_JOB_ID}
exclude:
- /templates/*template.html
Setting artifact retention period
Use the expires_in
keyword in the <pipeline>-ci.yml
file to specify how long the job artifacts are stored before they are deleted. If you don't set any value for this keyword, job artifacts are deleted as per the default expiration time, which is 30 days.
For detailes, see how to set artifact retention period.