Skip to main content

DataOps Runner Installation

The DataOps Runner is a long-running container that runs within a customer's infrastructure (on-premises or private cloud). Typically, it runs inside the on-premises/private network for security reasons (among others) to give the jobs in a DataOps pipeline access to otherwise inaccessible resources.

This DataOps Runner regularly polls the DataOps application asking if there is any work for it to do, as seen in the gif below:

DataOps Runner polls DataOps application for work

Follow the steps in this section to install and configure a DataOps Runner for your compute environment. As part of the installation, the runner is associated with your group (preferred) or project in the DataOps application.

You can have multiple DataOps Runners in many locations, with each job executed by a specific runner. For instance, if you have a tool that needs orchestrating in Singapore and London, but no direct connectivity is allowed between these locations, the solution is to install a separate DataOps Runner at each location.

The next question that deserves an answer is how to get each runner to pick up the correct jobs or the destined jobs for each runner.

You tag each job with an identifying tag. For instance, let's assume the runners are named and tagged dataops-runner-singapore and dataops-runner-london, respectively. Tag each job with dataops-runner-singapore to indicate to the Singapore Runner that there are jobs that must be run by it, and vice versa.

DataOps Account

To get started, you'll need a DataOps account. If you don't have one, you can set one up by logging in to your Snowflake tenant and clicking on "Partner Connect," then select "DataOps" or contact us at support@dataops.live.

Physical Infrastructure

The DataOps Runner must be installed on a Linux Server or Host or VM in a location with Snowflake access and access for all the other systems/tools you need to connect to from DataOps jobs.

The exact nature of the server/VM is up to you and will differ between bare metal, AWS, or Azure.

Minimum production specifications:

  • Ubuntu 20.04 (18.04 is possible but not recommended)
  • 4 CPU cores
  • 16GB RAM
  • Minimum 50GB Disk/Storage (300GB recommended)
  • As a guide, for most use cases an AWS t3a.xlarge (or equivalent)
  • A sudo user

Minimum PoC/Pilot specifications:

  • Ubuntu 20.04 (18.04 is possible but not recommended)
  • 2 CPU cores
  • 8GB RAM
  • Minimum 50GB Disk/Storage (300GB recommended)
  • As a guide, for most use cases an AWS t3a.large (or equivalent)
  • A sudo user

Network access

This Server/Host requires outbound internet access, as a minimum to :

See the DataOps Architecture doc for a general overview of system architecture and the DataOps Security and Governance Appendix for a detailed discussion on networking.

Docker

Docker should be installed following the instructions at the Docker site for your OS of choice, e.g., for Ubuntu here: https://docs.docker.com/engine/install/ubuntu/.

caution

We recommend not installing Docker via the default Ubuntu repository, as this is often quite old.

Then (if you didn't do it as part of the Docker installation instructions):

  • Run sudo usermod -aG docker $USER (this allows you to run docker without being root)
  • Log out and log in

To test your docker install run:

docker run hello-world

DataOps Runner installation

You are now ready to install the runner itself.

Step 1 - Fetch Registration Tokens from DataOps

The Registration Token is generated automatically in [DataOps.live (https://app.dataops.live) and is used to link together the runner you are about to create with your specific DataOps Project or Group.

note

These registration tokens are scoped.

Follow these steps to obtain your Registration Token:

  • Connect to the DataOps Platform UI
  • Open the group (preferred) or project you want to create the runner for
    • choosing the group makes the runner available to all projects in that group
  • Go to Settings > CI / CD
  • Find the Runners section and click Expand
  • Then, inside the Specific runners section under Set up a specific runner manually, you will find the token
  • Copy it

Runner Token

warning

Ignore the Install GitLab Runner on Kubernetes, as we will be installing a bespoke DataOps Runner instead.

Step 2 - Connect to Docker Hub

On your runner host CLI run:

docker login --username  dataopsreadserviceuser --password qf2h9372fg3ioug384
caution

Try prefixing the command with sudo if it doesn't work. However, the usermod -aG docker hasn't been done correctly and may cause future issues.

note

The dataopsreadserviceuser is a read-only service account to allow you to pull the dataopslive/dataops-runner image.

Step 3 - Configure the Runner

This can be done in a single command:

export DATAOPS_URL=https://app.dataops.live
export REGISTRATION_TOKEN=xxxxxxx # This is the token you copied from the UI in step 1
export AGENT_NAME=my-documentation-runner # Change this to your desired name xxx-runner

then:

docker run --rm -e DEBUG=true -v /srv/dataops-runner-$AGENT_NAME/config:/etc/gitlab-runner dataopslive/dataops-runner register \
--non-interactive \
--executor "docker" \
--docker-image dataopslive/dataops-utils-runner \
--url "$DATAOPS_URL" \
--registration-token "$REGISTRATION_TOKEN" \
--description "$AGENT_NAME" \
--tag-list "$AGENT_NAME" \
--run-untagged="false" \
--locked="false" \
--access-level="not_protected"

You should now go back to the UI and, in the same location, see the new one you created e.g.

My New Runner

The runner has been given a random identifier that cannot be changed. The blue triangle in the diagram above indicates that the new runner has registered but is not yet running.

Step 4 - Additional Configuration

Several key configurations are required that are not set using the standard register command. Set these by running the following commands on your server:

# allow agent to run up to 8 concurrent jobs
sudo sed -i 's/concurrent = .*/concurrent = 8/' /srv/dataops-runner-$AGENT_NAME/config/config.toml
# have agent poll server every 1 second
sudo sed -i 's/check_interval = .*/check_interval = 1/' /srv/dataops-runner-$AGENT_NAME/config/config.toml
# mounts the /app into /local_config inside every runner that is started by this agent.
sudo sed -i 's/ volumes =.*$/ volumes = ["\/app:\/local_config:rw","\/agent_cache:\/agent_cache:rw", "\/secrets:\/secrets:ro"]/' /srv/dataops-runner-$AGENT_NAME/config/config.toml

Step 5 - Start the DataOps Runner

docker run -d --name $AGENT_NAME --restart always \
-v /srv/dataops-runner-$AGENT_NAME/config:/etc/gitlab-runner \
-v /var/run/docker.sock:/var/run/docker.sock \
dataopslive/dataops-runner

or, for extra debugging:

docker run -d  -e DEBUG=true --name $AGENT_NAME --restart always \
-v /srv/dataops-runner-$AGENT_NAME/config:/etc/gitlab-runner \
-v /var/run/docker.sock:/var/run/docker.sock \
dataopslive/dataops-runner

You should now see:

My New Runner

Test this out!

At this point, you should be able to run a pipeline (e.g., the full-ci.yml created from the template project). If the first job in the pipeline changes to a blue pie, the job is running on your Runner, everything is connected, and you can move on.

If not, check your setup and, if needed, contact support@dataops.live.

initial job running

Start and Stop the DataOps Runner

Once you have completed Steps 1 to 5 of the initial runner setup, you don't need to repeat the steps every time to start or stop the runner.

To start:

export AGENT_NAME=my-documentation-runner   # Change this to your desired name xxx-runner
docker start $AGENT_NAME

To stop:

export AGENT_NAME=my-documentation-runner   # Change this to your desired name xxx-runner
docker stop $AGENT_NAME

Credentials and Secrets

The DataOps Platform/Runner model's basic security model is that the platform and repository contain all the information about what should be done. But they have none of the credentials actually to do it. These credentials are stored on your DataOps Runner so that no one else has access to them.

tip

This process is described in more detail here, and it is advisable to read this page before proceeding any further.

DataOps Vault Setup

DataOps requires a directory from the host called /secrets with a /secrets/vault.yml and /secrets/vault.salt

To create the minimum base vault configuration simply run:

sudo mkdir -p /secrets
echo {} | sudo tee /secrets/vault.yml > /dev/null
echo $RANDOM | md5sum | head -c 20 | sudo tee /secrets/vault.salt > /dev/null

Full details on when and how to add to these are here.