DataOps Runner Installation
The DataOps Runner is a long-running container that runs within a customer's infrastructure (on-premises or private cloud). Typically, it runs inside the on-premises/private network for security reasons (among others) to give the jobs in a DataOps pipeline access to otherwise inaccessible resources.
This DataOps Runner regularly polls the DataOps application asking if there is any work for it to do, as seen in the gif below:

Follow the steps in this section to install and configure a DataOps Runner for your compute environment. As part of the installation, the runner is associated with your group (preferred) or project in the DataOps application.
You can have multiple DataOps Runners in many locations, with each job executed by a specific runner. For instance, if you have a tool that needs orchestrating in Singapore and London, but no direct connectivity is allowed between these locations, the solution is to install a separate DataOps Runner at each location.
The next question that deserves an answer is how to get each runner to pick up the correct jobs or the destined jobs for each runner.
You tag each job with an identifying tag. For instance, let's assume the runners are named and tagged dataops-runner-singapore
and dataops-runner-london
, respectively. Tag each job with dataops-runner-singapore
to indicate to the Singapore Runner that there are jobs that must be run by it, and vice versa.
DataOps Account
To get started, you'll need a DataOps account. If you don't have one, you can set one up by logging in to your Snowflake tenant and clicking on "Partner Connect," then select "DataOps" or contact us at support@dataops.live.
Physical Infrastructure
The DataOps Runner must be installed on a Linux Server or Host or VM in a location with Snowflake access and access for all the other systems/tools you need to connect to from DataOps jobs.
The exact nature of the server/VM is up to you and will differ between bare metal, AWS, or Azure.
Minimum production specifications:
- Ubuntu 20.04 (18.04 is possible but not recommended)
- 4 CPU cores
- 16GB RAM
- Minimum 50GB Disk/Storage (300GB recommended)
- As a guide, for most use cases an AWS
t3a.xlarge
(or equivalent) - A
sudo
user
Minimum PoC/Pilot specifications:
- Ubuntu 20.04 (18.04 is possible but not recommended)
- 2 CPU cores
- 8GB RAM
- Minimum 50GB Disk/Storage (300GB recommended)
- As a guide, for most use cases an AWS
t3a.large
(or equivalent) - A
sudo
user
Network access
This Server/Host requires outbound internet access, as a minimum to :
- The DataOps app at https://app.dataops.live
- Docker Hub at https://hub.docker.com/
- The applicable Snowflake instances at https://...prefix....snowflakecomputing.com/
See the DataOps Architecture doc for a general overview of system architecture and the DataOps Security and Governance Appendix for a detailed discussion on networking.
Docker
Docker should be installed following the instructions at the Docker site for your OS of choice, e.g., for Ubuntu here: https://docs.docker.com/engine/install/ubuntu/.
caution
We recommend not installing Docker via the default Ubuntu repository, as this is often quite old.
Then (if you didn't do it as part of the Docker installation instructions):
- Run
sudo usermod -aG docker $USER
(this allows you to run docker without being root) - Log out and log in
To test your docker install run:
docker run hello-world
DataOps Runner installation
You are now ready to install the runner itself.
Step 1 - Fetch Registration Tokens from DataOps
The Registration Token is generated automatically in [DataOps.live (https://app.dataops.live) and is used to link together the runner you are about to create with your specific DataOps Project or Group.
note
These registration tokens are scoped.
Follow these steps to obtain your Registration Token:
- Connect to the DataOps Platform UI
- Open the group (preferred) or project you want to create the runner for
- choosing the group makes the runner available to all projects in that group
- Go to Settings > CI / CD
- Find the Runners section and click Expand
- Then, inside the Specific runners section under Set up a specific runner manually, you will find the token
- Copy it
warning
Ignore the Install GitLab Runner on Kubernetes
, as we will be installing a bespoke DataOps Runner instead.
Step 2 - Connect to Docker Hub
On your runner host CLI run:
docker login --username dataopsreadserviceuser --password qf2h9372fg3ioug384
caution
Try prefixing the command with sudo
if it doesn't work. However, the usermod -aG docker
hasn't been done correctly and may cause future issues.
note
The dataopsreadserviceuser
is a read-only service account to allow you to pull the dataopslive/dataops-runner image.
Step 3 - Configure the Runner
This can be done in a single command:
export DATAOPS_URL=https://app.dataops.live
export REGISTRATION_TOKEN=xxxxxxx # This is the token you copied from the UI in step 1
export AGENT_NAME=my-documentation-runner # Change this to your desired name xxx-runner
then:
docker run --rm -e DEBUG=true -v /srv/dataops-runner-$AGENT_NAME/config:/etc/gitlab-runner dataopslive/dataops-runner register \
--non-interactive \
--executor "docker" \
--docker-image dataopslive/dataops-utils-runner \
--url "$DATAOPS_URL" \
--registration-token "$REGISTRATION_TOKEN" \
--description "$AGENT_NAME" \
--tag-list "$AGENT_NAME" \
--run-untagged="false" \
--locked="false" \
--access-level="not_protected"
You should now go back to the UI and, in the same location, see the new one you created e.g.
The runner has been given a random identifier that cannot be changed. The blue triangle in the diagram above indicates that the new runner has registered but is not yet running.
Step 4 - Additional Configuration
Several key configurations are required that are not set using the standard register command. Set these by running the following commands on your server:
# allow agent to run up to 8 concurrent jobs
sudo sed -i 's/concurrent = .*/concurrent = 8/' /srv/dataops-runner-$AGENT_NAME/config/config.toml
# have agent poll server every 1 second
sudo sed -i 's/check_interval = .*/check_interval = 1/' /srv/dataops-runner-$AGENT_NAME/config/config.toml
# mounts the /app into /local_config inside every runner that is started by this agent.
sudo sed -i 's/ volumes =.*$/ volumes = ["\/app:\/local_config:rw","\/agent_cache:\/agent_cache:rw", "\/secrets:\/secrets:ro"]/' /srv/dataops-runner-$AGENT_NAME/config/config.toml
Step 5 - Start the DataOps Runner
docker run -d --name $AGENT_NAME --restart always \
-v /srv/dataops-runner-$AGENT_NAME/config:/etc/gitlab-runner \
-v /var/run/docker.sock:/var/run/docker.sock \
dataopslive/dataops-runner
or, for extra debugging:
docker run -d -e DEBUG=true --name $AGENT_NAME --restart always \
-v /srv/dataops-runner-$AGENT_NAME/config:/etc/gitlab-runner \
-v /var/run/docker.sock:/var/run/docker.sock \
dataopslive/dataops-runner
You should now see:
Test this out!
At this point, you should be able to run a pipeline (e.g., the full-ci.yml
created from the template project). If the first job in the pipeline changes to a blue pie, the job is running on your Runner, everything is connected, and you can move on.
If not, check your setup and, if needed, contact support@dataops.live.
Start and Stop the DataOps Runner
Once you have completed Steps 1 to 5 of the initial runner setup, you don't need to repeat the steps every time to start or stop the runner.
To start:
export AGENT_NAME=my-documentation-runner # Change this to your desired name xxx-runner
docker start $AGENT_NAME
To stop:
export AGENT_NAME=my-documentation-runner # Change this to your desired name xxx-runner
docker stop $AGENT_NAME
Credentials and Secrets
The DataOps Platform/Runner model's basic security model is that the platform and repository contain all the information about what should be done. But they have none of the credentials actually to do it. These credentials are stored on your DataOps Runner so that no one else has access to them.
tip
This process is described in more detail here, and it is advisable to read this page before proceeding any further.
DataOps Vault Setup
DataOps requires a directory from the host called /secrets
with a /secrets/vault.yml
and /secrets/vault.salt
To create the minimum base vault configuration simply run:
sudo mkdir -p /secrets
echo {} | sudo tee /secrets/vault.yml > /dev/null
echo $RANDOM | md5sum | head -c 20 | sudo tee /secrets/vault.salt > /dev/null
Full details on when and how to add to these are here.