Runner Overview
The DataOps runner is a long-running container that runs within a customer's infrastructure (on-premises or private cloud). Typically, it runs inside the on-premises/private network for security reasons (among others) to give the jobs in a DataOps pipeline access to otherwise inaccessible resources.
This DataOps runner regularly polls the DataOps application asking if there is any work for it to do, as seen in the gif below:
Follow the steps in Docker Runner Installation and Kubernetes Runner Installation to install and configure a DataOps runner for your compute environment. As part of the installation, the runner is associated with your group (preferred) or project in the data product platform.
You can have multiple DataOps runners in many locations, with each job executed by a specific runner. For instance, if you have a tool that needs orchestrating in Singapore and London, but no direct connectivity is allowed between these locations, the solution is to install a separate DataOps runner at each location as the DataOps solution is not limited by region.
The next question that deserves an answer is how to get each runner to pick up the correct jobs or the destined jobs for each runner.
You tag each job with an identifying tag. For instance, let's assume the runners
are named and tagged dataops-runner-singapore
and dataops-runner-london
,
respectively. Tag each job with dataops-runner-singapore
to indicate to the
Singapore Runner that there are jobs that must be run by it, and vice versa.
Runner versions
There are two versions of the DataOps runner currently available.
latest
— This is the fully released DataOps runner and is the default version used when setting up a runner using Docker or Kubernetes.latest-next
— This version allows you to opt into the next DataOps runner release. The expectation is that it will not change significantly before becoming 'latest'.
Once a DataOps runner has been set up it will not automatically update when
latest
or latest-next
are updated. See the upgrade section from the guides
below, to manually upgrade your runner.
Runner architecture
The following diagram shows the connectivity between the runner, the data product platform, and some secret solutions. In this example, the runner is deployed using Docker onto an EC2 instance, but you can deploy the runner to many environments. Check the physical prerequisites for the Docker runner here.
There are costs associated with deploying AWS resources — use the AWS Pricing Calculator to estimate costs for any required infrastructure. Any AWS account has default quotas for each AWS service. Read more about how to manage your service quotas in the AWS documentation.