Skip to main content

Job Artifacts

What are job artifacts?

Job artifacts capture the results of jobs that run in a pipeline. Artifacts are a list of files and directories that are generated by the job execution and are passed between jobs.

Artifacts in the DataOps.live ecosystems are used in the following ways:

All job artifacts are accessible from the UI at the end of the pipeline run.

Jobs generate artifacts

Assume you are building a custom job that extends the data product platform. You can do this for example based on the Utils Orchestrator:

pipelines/includes/local_includes/my_jobs/custom_job.yml
Custom Job:
extends:
- .agent_tag
image: $DATAOPS_UTILS_RUNNER_IMAGE
variables:
VAR1: value
stage: Additional Configuration
script:
- echo "placeholder for some actual work"
- mkdir -p ${CI_PROJECT_DIR}/job_output
- echo "line 1 of summary" > ${CI_PROJECT_DIR}/job_output/summary.txt
artifacts:
name: My custom artifact
when: always
paths:
- ${CI_PROJECT_DIR}/job_output/
icon: ${UTIL_ICON}

The artifacts keyword section is new and is required to capture non-log output from this job as part of a pipeline. The details are as follows:

  • name - the name for this job's list of artifacts
  • paths - the list of files or directories constituting the artifacts
  • when - always indicates that the job must generate this artifact every time the pipeline runs. If when is omitted, the artifacts are just stored on job success

Finding artifacts

You can access all artifacts of all jobs in a pipeline from the pipelines page. You can access the artifacts of an individual job from either:

  • The job summary page (navigate to CI/CD → Jobs)

    job summary page !!shadow!!

  • The job execution detail page

    find job artifacts details !!shadow!!

Creating artifacts

Let us look at how to create artifacts in more detail. To use them, you must add the artifacts keyword to your <pipeline>-ci.yml file to leverage job (and pipeline) artifacts. For instance, the following YAML config file shows how the artifacts are set up:

my-reporting-pipeline-ci.yml
my reporting job:
artifacts:
name: My Job Report
when: always
paths:
- /my_job_report
- run_result.log
expires_in: 1 week

In this example, the following details are relevant:

  • A job called my reporting job runs and generates an artifact called My Job Report
  • The paths keyword determines which directories and files to add to the job artifacts
  • The when: always keywords indicate that the job must generate this artifact every time the pipeline runs. If when is omitted the artifacts are just stored on job success
  • The expires_in keyword determines how long these artifacts are kept before being marked for deletion
note

If you run two jobs concurrently in a single pipeline stage, the job that finishes last creates the artifact files.

If you want to disable artifact passing, define the job with empty dependencies as follows:

my-reporting-pipeline-ci.yml
# set up artifact details
my reporting job:
stage: build
script: make build
dependencies: []

If you want to create artifacts only for a given branch, use rules to build these artifacts, like the following example:

my-reporting-pipeline-ci.yml
my reporting job:
artifacts:
name: My Job Report
paths:
- /my_job_report
rules:
- if: $CI_COMMIT_BRANCH == 'production'

In this scenario, the variable $CI_COMMIT_BRANCH is set to production. Therefore, this job will only run, and these artifacts will only be generated when this pipeline executes in the context of production.

Using pipeline variables with artifacts

It is also possible to use CI pipeline variables to dynamically define several of the details found in the artifacts section of the pipeline-ci.yml configuration file. Here are some simple examples:

The following code snippet uses the ${CI_JOB_ID} as the dynamic artifact name:

my-reporting-pipeline-ci.yml
my reporting job:
artifacts:
name: ${CI_JOB_ID}
paths:
- /my_job_report

This example shows how the ${CI_PIPELINE_ID} plus the ${CI_JOB_ID} make up the artifact's paths keyword:

my-reporting-pipeline-ci.yml
my reporting job:
artifacts:
name: ${CI_JOB_ID}
paths:
- /${CI_PIPELINE_ID}/${CI_JOB_ID}

Excluding files from artifacts

It is possible to exclude specific files from being added to an artifact. For instance, let's assume we want to exclude all template.html files from an archive. To achieve this, all we do is add the exclude keyword to our pipeline YAML file as follows:

my-reporting-pipeline-ci.yml
my reporting job:
artifacts:
name: ${CI_JOB_ID}
paths:
- /${CI_PIPELINE_ID}/${CI_JOB_ID}
exclude:
- /templates/*template.html

Setting artifact retention period

Use the expires_in keyword in the <pipeline>-ci.yml file to specify how long the job artifacts are stored before they are deleted. If you don't set any value for this keyword, job artifacts are deleted as per the default expiration time, which is 30 days.

For detailes, see how to set artifact retention period.