DataOps Runners are part of the customer's DataOps deployment. We are not directly managing the infrastructure of our client's runner instances and thus have no insight into the host's system health.
To assist our clients, we have created a runner health check script. This script is part of the DataOps Reference Project and allows you to regularly check for potential issues with the runner instance and address them in advance.
Health check overview
This script checks for the following characteristics of the runner instance:
- CPU and memory usage (at the moment of pipeline run)
- Full list of Docker images, containers, and volumes
- Full list of Docker networks
- Description of the Docker System Info (information about the instance, version, state, etc.)
- Available space (unreserved disk space on the system)
- Extended images info
- Extended container info
- Extended volumes info
- Extended network info
The used Docker commands also emit the environment variables and their values. We masked all sensitive information containing values like:
- SSL/TLS certificates
- API keys
When running the job, the log contains the script output. In addition, the same is available as an artifact.
Configuring the runner health check script
Step 1 - Edit the runner configuration
To set this new job, you need to manually edit your
config.toml file (the file configuring the runner). The default file location at the runner instance is
Add the following path to the volumes property:
volumes = ["/var/run/docker.sock:/var/run/docker.sock"]
This action is required for the container to be able to execute docker commands against the Docker Daemon and retrieve information about the host system.
Once you no longer need the health check script, it is advised to remove this volume path. Possible security issues could arise by letting containers have access to the Docker Daemon and run commands against it.
Step 2 - Include the job
Add this path to the
full-ci.yml or the pipeline file that you wish to add the job to:
## health check job
- project: reference-template-projects/dataops-template/dataops-reference
Step 3 - Run the pipeline
Run the pipeline and check the job for output and artifact.