DataOps Kubernetes Runner Scaling
To manage the number of concurrent jobs executing as part of a pipeline you can increase both the number of runners deployed to the cluster as well as the number of jobs a single runner executes:
...
replicas: 3
runners:
config: |
concurrent = 8
[[runners]]
[runners.kubernetes]
...
To deploy more than one runner pod to your Kubernetes cluster set the replicas
key in your dataops-values.yml
file.
The concurrency of jobs for a runner is set to 8 by default. You can increase the concurrency of jobs by setting the concurrent
key in your values.yml.
The maximum number of jobs that can run concurrently on the cluster is the number of replicas
multiplied by the concurrent
limit.
As a general rule, the number of replicas drives high availability and only in rare cases should exceed 3. Match the number of replicas to the number of cloud vendor availability zones your cluster is deployed on. To scale out your workloads and fit them to your cluster's capacity increase your concurrent limit instead, e.g. to 100.
Load balancing
The runner does not try to balance the jobs being scheduled across nodes. Instead, jobs will be spread according to the configured behavior of the cluster by the cluster itself.
You can customize the scheduling behavior of pods through the affinity
, nodeSelector
and tolerations
keys in your values.yml file.
High availability
Having more than one runner can help ensure that jobs do not spend time waiting to be picked up by a runner and gives you some resistance to disasters such as a node going down. We recommend matching the number of runners for a given group of projects with the number of availability zones of your cloud vendor.
Runners and jobs are isolated so that runners going down will not affect any jobs that have already been scheduled and the deployment should schedule a new runner pod in this scenario.