How to Maintain DataOps Runner Disk Space
Clearing DataOps runner Docker images
The DataOps team does not have access to clients' runners for security reasons. However, we provide valuable tips on preserving the health of your runner's environment. One handy one is performing a regular docker system prune
command with a CRON job. This command removes the following:
- all stopped containers
- all networks not used by at least one container
- all dangling images
- all dangling build cache
With the usage of the DataOps Runner, the above can inevitably accumulate into a massive amount of unnecessarily used disk space over time. The solution for this is simple:
Setting a CRON job to perform docker system prune
As a first step, create a file in the /etc/cron.weekly/
folder:
cd /etc/cron.weekly
sudo nano docker_system_prune.sh
Afterward, populate the file:
#!/bin/bash
docker system prune --all --force
All done!
Clearing the DataOps runner persistent cache
DataOps introduces the persistent_cache
to store information between different pipeline runs. This cache cannot be cleared from the data product platform, and it's unadvisable to clear it automatically. However, over time it can also accumulate a large amount of data, most of which will not be used.
You will find the persistent_cache
as a folder under the agent_cache
on your runner instance. It holds the following example structure:
/agent_cache/persistent_cache/PROJECT_NAME/BRANCH_NAME
You can safely delete the cache for branches that no longer exist or will not be used anymore manually.