Skip to main content

DataOps Docker Runner Troubleshooting

The two most common issues when installing the DataOps runner are as follows:

Let's dive into each one of these issues.

Not connecting to the data product platform

The following steps will troubleshoot this issue:

  1. Check that in the [[runners]] section of /srv/dataops-runner-<agent-name>/config/config.toml, the url key is set to

  2. Using the runner CLI, check you have network connectivity with the following command:

    $ curl

    You should see an HTML response about redirection.

  3. Check the dataops-runner is running with:

    $ docker ps

    And check your runner is listed and has the uptime/status you expect.

  4. Check the DataOps runner logs using the following command:

    $ docker logs <agent-name> 2>&1 | tail

    The output should look something like this:

    Checking for jobs... received                       job=435937 repo_url= runner=rFnpVSq2
    Job succeeded duration=1m26.444532615s job=435933 project=2 runner=rFnpVSq2
    Checking for jobs... received job=435938 repo_url= runner=rFnpVSq2
    Job succeeded duration=1m28.049357128s job=435934 project=2 runner=rFnpVSq2
    Checking for jobs... received job=435939 repo_url= runner=rFnpVSq2
    Job succeeded duration=2m1.548508616s job=435930 project=2 runner=rFnpVSq2
    Job succeeded duration=2m7.399498558s job=435937 project=2 runner=rFnpVSq2
    Job succeeded duration=1m38.148697313s job=435939 project=2 runner=rFnpVSq2
    Job succeeded duration=4m34.865132655s job=435932 project=2 runner=rFnpVSq2
    Job succeeded duration=4m41.910581557s job=435938 project=2 runner=rFnpVSq2

    If there are any connection issues with the runner, they are very likely output in the logs.

  5. Check the DataOps runner within the data product platform and confirm that it has checked in recently.

  6. If all of these look good, but the runner (agent) is not picking up jobs from your pipeline, check that you are using the right agent. It's very easy to have two runners with the same name, one correctly configured but the other one connected to your project.

Not using the current orchestrator image

The expected behavior is that before the DataOps runner starts up any other orchestrator, it checks for new versions of the image with a docker pull. However, very occasionally, this does not work.

To force this manually, you can either:

  • docker pull dataopslive/dataops-xxxxx-runner:5-stable on the orchestrator host to force pulling down the latest version of the orchestrator image, where xxxxx is the name of the orchestrator.
  • docker image prune, which will remove all images not currently being used, forcing the latest versions to be downloaded the next time they are needed

Unable to locate credentials on AWS EC2 with IMDSv2 enforced

When running DataOps pipelines on DataOps runners deployed in AWS EC2 that have Instance Metadata Service Version 2 (IMDSv2) enforced, it is necessary to ensure the HttpPutResponseHopLimit is set to at least 2 so that AWS commands running in jobs can access the AWS metadata endpoints from within their Docker container.

For example:

$ aws ec2 modify-instance-metadata-options --instance-id i-1234567898abcdef0 --http-put-response-hop-limit 3

Without this configuration, jobs are very likely to fail with errors saying Unable to locate credentials.