Skip to main content

How to Maintain DataOps Runner Disk Space

Clearing DataOps runner Docker images

The DataOps team does not have access to clients' runners for security reasons. However, we provide valuable tips on preserving the health of your runner's environment. One handy one is performing a regular docker system prune command with a CRON job. This command removes the following:

  • all stopped containers
  • all networks not used by at least one container
  • all dangling images
  • all dangling build cache

With the usage of the DataOps Runner, the above can inevitably accumulate into a massive amount of unnecessarily used disk space over time. The solution for this is simple:

Setting a CRON job to perform docker system prune

As a first step, create a file in the /etc/cron.weekly/ folder:

cd /etc/cron.weekly
sudo nano docker_system_prune.sh

Afterward, populate the file:

docker_system_prune.sh
#!/bin/bash
docker system prune --all --force

All done!

Clearing the DataOps runner persistent cache

DataOps introduces the persistent_cache to store information between different pipeline runs. This cache cannot be cleared from the data product platform, and it's unadvisable to clear it automatically. However, over time it can also accumulate a large amount of data, most of which will not be used.

You will find the persistent_cache as a folder under the agent_cache on your runner instance. It holds the following example structure:

/agent_cache/persistent_cache/PROJECT_NAME/BRANCH_NAME

You can safely delete the cache for branches that no longer exist or will not be used anymore manually.