DataOps Docker Runner Troubleshooting
The two most common issues when installing the DataOps runner are as follows:
Let's dive into each one of these issues.
Not connecting to the data product platform
The following steps will troubleshoot this issue:
-
Check that in the
[[runners]]
section of/srv/dataops-runner-<agent-name>/config/config.toml
, theurl
key is set tohttps://app.dataops.live
-
Using the runner CLI, check you have network connectivity with the following command:
$ curl https://app.dataops.live
You should see an HTML response about redirection.
-
Check the dataops-runner is running with:
$ docker ps
And check your runner is listed and has the uptime/status you expect.
-
Check the DataOps runner logs using the following command:
$ docker logs <agent-name> 2>&1 | tail
The output should look something like this:
Checking for jobs... received job=435937 repo_url=https://app.dataops.live/dataops-demo-project/truedataops-demo-project.git runner=rFnpVSq2
Job succeeded duration=1m26.444532615s job=435933 project=2 runner=rFnpVSq2
Checking for jobs... received job=435938 repo_url=https://app.dataops.live/dataops-demo-project/truedataops-demo-project.git runner=rFnpVSq2
Job succeeded duration=1m28.049357128s job=435934 project=2 runner=rFnpVSq2
Checking for jobs... received job=435939 repo_url=https://app.dataops.live/dataops-demo-project/truedataops-demo-project.git runner=rFnpVSq2
Job succeeded duration=2m1.548508616s job=435930 project=2 runner=rFnpVSq2
Job succeeded duration=2m7.399498558s job=435937 project=2 runner=rFnpVSq2
Job succeeded duration=1m38.148697313s job=435939 project=2 runner=rFnpVSq2
Job succeeded duration=4m34.865132655s job=435932 project=2 runner=rFnpVSq2
Job succeeded duration=4m41.910581557s job=435938 project=2 runner=rFnpVSq2If there are any connection issues with the runner, they are very likely output in the logs.
-
Check the DataOps runner within the data product platform and confirm that it has checked in recently.
-
If all of these look good, but the runner (agent) is not picking up jobs from your pipeline, check that you are using the right agent. It's very easy to have two runners with the same name, one correctly configured but the other one connected to your project.
Not using the current orchestrator image
The expected behavior is that before the DataOps runner starts up any other
orchestrator, it checks for new versions of the image with a docker pull
.
However, very occasionally, this does not work.
To force this manually, you can either:
docker pull dataopslive/dataops-xxxxx-runner:5-stable
on the orchestrator host to force pulling down the latest version of the orchestrator image, wherexxxxx
is the name of the orchestrator.docker image prune
, which will remove all images not currently being used, forcing the latest versions to be downloaded the next time they are needed
Unable to locate credentials
on AWS EC2 with IMDSv2 enforced
When running DataOps pipelines on DataOps runners deployed in AWS EC2 that have Instance Metadata Service Version 2 (IMDSv2) enforced, it is necessary to ensure the HttpPutResponseHopLimit
is set to at least 2
so that AWS commands running in jobs can access the AWS metadata endpoints from within their Docker container.
For example:
$ aws ec2 modify-instance-metadata-options --instance-id i-1234567898abcdef0 --http-put-response-hop-limit 3
Without this configuration, jobs are very likely to fail with errors saying Unable to locate credentials
.