Skip to main content

DataOps Kubernetes Runner Scaling

To manage the number of concurrent jobs executing as part of a pipeline you can increase both the number of runners deployed to the cluster as well as the number of jobs a single runner executes:

dataops-values.yml
...
replicas: 3
runners:
config: |
concurrent = 8
[[runners]]
[runners.kubernetes]
...

To deploy more than one runner pod to your Kubernetes cluster set the replicas key in your dataops-values.yml file.

The concurrency of jobs for a runner is set to 8 by default. You can increase the concurrency of jobs by setting the concurrent key in your values.yml.

The maximum number of jobs that can run concurrently on the cluster is the number of replicas multiplied by the concurrent limit.

As a general rule, the number of replicas drives high availability and only in rare cases should exceed 3. Match the number of replicas to the number of cloud vendor availability zones your cluster is deployed on. To scale out your workloads and fit them to your cluster's capacity increase your concurrent limit instead, e.g. to 100.

Load balancing

The runner does not try to balance the jobs being scheduled across nodes. Instead, jobs will be spread according to the configured behavior of the cluster by the cluster itself.

You can customize the scheduling behavior of pods through the affinity, nodeSelector and tolerations keys in your values.yml file.

High availability

Having more than one runner can help ensure that jobs do not spend time waiting to be picked up by a runner and gives you some resistance to disasters such as a node going down. We recommend matching the number of runners for a given group of projects with the number of availability zones of your cloud vendor.

Runners and jobs are isolated so that runners going down will not affect any jobs that have already been scheduled and the deployment should schedule a new runner pod in this scenario.