DataOps Kubernetes Runner Installation

Step 1 - Secrets and volumes

You will need Kubernetes secrets and persistent volume claims created before installing the runner Helm chart.

Secrets

Docker registry

You will need an image pull secret to allow the pods to pull from the DataOps repository.

Create the secret in your cluster:

kubectl create secret docker-registry docker-creds \
  --docker-server=docker.io \
  --docker-username=dataopsread \
  --docker-password=dckr_pat_82FQ4O6N4yb6fXJc15kIvX4Qrtg \
  --docker-email=support@dataops.live

note

These examples install the runner in the default Kubernetes namespace.

DataOps runner registration token

You will need a secret to hold the runner registration token.

The registration token is generated automatically in DataOps.live and is used to link together the runner you are about to create with your specific DataOps Project or Group.

note

The scope of registration tokens is to the project or group you obtained them from.

Follow these steps to obtain your registration token:

Connect to the data product platform.
Open the group (preferred) or project you want to create the runner for.
At the group level, follow the below steps:
1. Click CI/CD → Runners. Choosing the group makes the runner available to all projects in that group.
2. Expand Register a group runner on the top right and copy the registration token.
At the project level, follow the below steps:
1. Click Settings → CI/CD.
2. Find the Runners section and click Expand.
3. Copy the registration token from inside the Project runners section under Set up a project runner for a project.

Create the secret in your cluster and replace REGISTRY_TOKEN with the copied token:

kubectl create secret generic reg-token \
  --from-literal=runner-registration-token=REGISTRY_TOKEN \
  --from-literal=runner-token=""

note

runner-token="" must remain on the command line as an empty string. The actual value is automatically populated during registration.

Volumes

We recommend using storage classes to create your persistent volumes dynamically. In this configuration, you will need two persistent volume claims (PVC) that are then used to mount two paths for the runner.

note

Avoid special characters like . in PVC names for best compatibility.

Azure Kubernetes Service
AWS Elastic Kubernetes Service

Storage class for Azure File Storage:

afs-storageclass.yml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: afs-sc
provisioner: file.csi.azure.com # replace with "kubernetes.io/azure-file" if aks version is less than 1.21
allowVolumeExpansion: true
mountOptions:
  - dir_mode=0777
  - file_mode=0777
  - uid=0
  - gid=0
  - mfsymlinks
  - cache=strict
  - actimeo=30
parameters:
  skuName: Premium_LRS

For example, the two Persistent Volume Claims using the previous Azure StorageClass:

azure-file-storage-pvc-cache.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pipeline-cache-dataops-live
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: afs-sc
  resources:
    requests:
      storage: 5Gi

azure-file-storage-pvc-config.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: local-config-dataops-live
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: afs-sc
  resources:
    requests:
      storage: 5Gi

Learn more about Azure Dynamic Storage related to Kubernetes in the Microsoft docs.

Storage class for EFS:

efs-storageclass.yml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: <efs_filesystem_id> # Replace with the id of your elastic file system
  directoryPerms: "700"
  gidRangeStart: "1000" # optional
  gidRangeEnd: "2000" # optional
  basePath: "/dynamic_provisioning" # optional

For example, two Persistent Volume Claims using the previous AWS StorageClass:

aws-gp2-storage-pvc-cache.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pipeline-cache-dataops-live
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi

aws-gp2-storage-pvc-config.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: local-config-dataops-live
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi

Learn more about AWS storage related to Kubernetes in the AWS user guide.

Step 2 - Add DataOps Helm chart repository

To install the DataOps Runner, you will need access to the Helm chart. Helm charts package templated Kubernetes manifests so they can be configured to fit customer environments.

Install the repo:

helm repo add dataops https://charts.dataops.live

Test the runner chart is available for installation:

helm search repo dataops

Update your local repository index with the latest versions of the chart:

helm repo update

Step 3 - Configure and install the runner

Create a new file named dataops-values.yml that will configure the DataOps runner Helm chart for your environment:

dataops-values.yml
imagePullSecrets:
  # Name of Docker registry credential created earlier.
  - name: "docker-creds"

image:
  # DataOps Runner version to use.
  tag: latest

# Main runner configuration
imagePullPolicy: Always
runners:
  # Equivalent to runner config.toml file contents.
  # https://docs.gitlab.com/runner/configuration/advanced-configuration.html
  config: |
    [[runners]]
      [runners.kubernetes]
        pull_policy = "always"
        namespace = "default"
        [[runners.kubernetes.volumes.pvc]]
          name = "pipeline-cache-dataops-live"
          mount_path = "/agent_cache"
        [[runners.kubernetes.volumes.pvc]]
          name = "local-config-dataops-live"
          mount_path = "/local_config"
  # Name of runner, used to identify in the cluster.
  name: <AGENT_NAME>
  # Tag used in jobs to specify this runner.
  tags: <AGENT_TAG>
  # Registration token secret name created earlier.
  secret: reg-token

Configure runner name and agent tag

First, review the config block and ensure you update the following settings in dataops-values.yml:

The <AGENT_TAG> - modify this value to reflect your runner's tag.
The <AGENT_NAME> - modify this value to reflect your runner's full name.

Adjust pull policies

Pull policies control how an image is fetched and updated by the runner. By default, the runner and orchestrators have an image pull policy of IfNotPresent and if-not-present, respectively, which may cause some undesirable behavior when new versions are released.

Set imagePullPolicy to Always for an up-to-date runner.
Set pull_policy to always for up-to-date orchestrators.

Doing so will force a pull where the runner always looks for and downloads the latest images. Read more about image pull policy and default image pull policy in the Kubernetes documentation.

Set namespace and persistent volumes

Finally, in your dataops-values.yml adjust the following:

The namespace to set the namespace to run Kubernetes jobs in.
The [[runners.kubernetes.volumes.pvc]] to identify the PVC configuration for orchestrators.

Note that two PVCs are required, pipeline-cache-dataops-live and local-config-dataops-live. Discover the specific configuration syntax for specifying your chosen volume storage from the Kubernetes executor documentation.

Install the Kubernetes runner

Install the runner using your chart values:

helm upgrade --install runner dataops/dataops-runner -f dataops-values.yml

success

DataOps runner chart installed! 🎉

Checking the runner's health

You can check if your runner is still contacting the data product platform from your group or object CI/CD page. To do this:

Navigate to your group or project CI/CD settings and expand the Runners section.

Depending on how you have registered your runner, it will show under one or more of the available runners lists.
Find your runner on this page and click the runner ID.

This opens a detailed page where you can see more information about your runner. The Last contact field shows whether your runner is healthy.

Step 1 - Secrets and volumes​

Secrets​

Docker registry​

DataOps runner registration token​

Volumes​

Step 2 - Add DataOps Helm chart repository​

Step 3 - Configure and install the runner​

Configure runner name and agent tag​

Adjust pull policies​

Set namespace and persistent volumes​

Install the Kubernetes runner​

Checking the runner's health​

Step 1 - Secrets and volumes

Secrets

Docker registry

DataOps runner registration token

Volumes

Step 2 - Add DataOps Helm chart repository

Step 3 - Configure and install the runner

Configure runner name and agent tag

Adjust pull policies

Set namespace and persistent volumes

Install the Kubernetes runner

Checking the runner's health