Skip to main content

DataOps Vault

DataOps provides vault functionality to keep all confidential data private by storing it on the host machine in an encrypted format. Additionally, all the secrets saved in the vault are available to pipelines to be used by any jobs.

One of the first pipeline jobs initialized the vault, which populates the secrets and other content before other jobs run. And depending on the project and pipeline configuration, this initialization is layered from different files and sources.

Vault Structure

The DataOps vault has a YAML-like structure, composed of a set of mandatory and optional objects. A typical vault structure looks like the following code snippet:

SNOWFLAKE:
ACCOUNT: <account>
TRANSFORM:
USERNAME: <transform_username>
ROLE: <transform_role>
PASSWORD: <transform_password>
WAREHOUSE: <transform_warehouse>
THREADS: 8
INGESTION:
USERNAME: <ingestion_username>
ROLE: <ingestion_role>
PASSWORD: <ingestion_password>
WAREHOUSE: <ingestion_warehouse>
THREADS: 8
MASTER:
USERNAME: <master_username>
ROLE: <master_role>
PASSWORD: <master_password>
AWS:
DEFAULT:
S3_KEY: XXXXXXXXXX
S3_SECRET: XXXXXXXXXX

The SNOWFLAKE section is currently standardized and mandatory, along with the AWS.DEFAULT credentials section. However, adding any other content to the vault outside of these objects without causing any pipeline issues is possible.

Vault Initialization

As this image shows, initializing the vault on each pipeline run comprises several layers that can add sensitive and non-sensitive content to the vault.

vault initialization steps from bootstrap to project settings

1. Local DataOps Runner Content

The original vault.yml file from the DataOps Runner's /secrets mount point is used as the base content for initializing the vault.

2. Sensitive Content

You can configure any pipeline to use a secrets manager from which sensitive values such as passwords and secure keys can be loaded. This content will add to and override values loaded in the previous step.

3. Additional Content

It is also possible to add a final layer of content to the vault by using a vault.template.yml file in the project itself. This can keep vault configurations local to the project (rather than the runner) and allow mapping of non-standard key/value naming schemes.

Configure the DataOps Vault

Firstly, the vault encryption is configured using a two-part method that uses a key and a salt.

  • The Vault Key is a random string of characters configured in each project, usually in file pipelines/includes/config/variables.yml using the variable DATAOPS_VAULT_KEY.
  • The Vault Salt is another random string contained in a file on the runner system. This is set up as part of the DataOps Runner Installation instructions.

1. Configure Local DataOps Runner Content

It is possible to omit all information from this vault configuration file and only use layer-2 and layer-3 network architectures. See below for an example of an empty vault configuration file.

note

Current system limitations require that the vault.yml file must exist on the runner. However, as described above, you can just set its content to an empty object {}.

An example of an empty vault.yml file is as follows:

{}

For more information, see the DataOps Runner Installation instructions.

2. Configure Secrets Manager

Most information security best practices mandate strong architectural security for protecting sensitive information. DataOps recommends using a secrets manager (such as AWS Secrets Manager or Azure Key Vault) for this purpose, and support for these systems is built into the DataOps platform.

Secrets loaded from a secrets manager will be applied to the vault after any local runner content.

note

This process may overwrite values. Therefore, care should be taken to demarcate which values are the responsibility of which platform.

Please see the Secrets Manager Orchestrator for full details of configuration and usage.

Configure Additional Vault Content

The above two methods are more than sufficient to provide configurability and security for many use cases. However, an additional layer of vault information can be supplied to pipelines, which is particularly useful in the following circumstances:

  • Moving non-sensitive configurations away from the runner's local vault.yml file
  • Re-mapping content loaded from a secrets manager into a different vault structure.

In order to address the first point, it can be more convenient to move away from holding configuration on the DataOps Runner, particularly values such as the Snowflake account name or configured numbers of threads. Instead, add these values to a vault.template.yml file in the project, which will be applied to the vault when pipelines run.

Secondly, it is not always possible or convenient to populate a secrets manager with value keys that precisely follow the DataOps vault structure. These values will still be loaded into the vault but at a different location, so a vault template can be used to re-map them into the desired places.

To create a vault template, create a file in your project at vault-content/vault.template.yml. This file can look as follows:

SNOWFLAKE:
ACCOUNT: {{ env.SNOWFLAKE_ACCOUNT }}
MASTER:
USERNAME: "{{ env.DATAOPS_PREFIX }}_MASTER"
## PASSWORD is set in Secrets Manager
ROLE: DATAOPS_ADMIN
TRANSFORM:
USERNAME: "{{ env.DATAOPS_PREFIX }}_TRANSFORMATION"
## PASSWORD is set in Secrets Manager
ROLE: "{{ env.DATAOPS_PREFIX }}_WRITER"
WAREHOUSE: "{{ env.DATAOPS_PREFIX }}_TRANSFORMATION"
THREADS: 8
INGESTION:
USERNAME: "{{ env.DATAOPS_PREFIX }}_INGESTION"
## PASSWORD is set in Secrets Manager
ROLE: "{{ env.DATAOPS_PREFIX }}_WRITER"
WAREHOUSE: "{{ env.DATAOPS_PREFIX }}_INGESTION"
THREADS: 8

As this is a .template file (see below), we can include Jinja variables in the same manner as elsewhere in DataOps, with two primary benefits:

  • The ability to specify static values for vault keys (e.g. THREADS: 8)
  • The ability to initialize values from environment variables (e.g. ACCOUNT: {{ env.SNOWFLAKE_ACCOUNT }})

Furthermore, we can refer to the values in the vault.yml template file because it is added to the vault after it has been initialized from the DataOps Runner's vault.yml (if it exists) and after any secrets manager information has been loaded. This allows information from the secrets manager to be re-mapped into other locations in the vault.

For example, if our secrets manager contains a key called my.snowflake.password, then we can map this into the vault in vault.template.yml as follows:

SNOWFLAKE:
...
MASTER:
...
PASSWORD: {{ my.snowflake.password }}
note

It is important not to introduce circular dependencies into the vault using this method. Only vault content loaded from a previous layer can be referenced in vault template variables.

Use the Vault

The most common methods for using values from the vault are in .template files and directly setting variables using the DATAOPS_VAULT(...) syntax.

To set the INGESTION credentials in variables prefixed with SNOW_, you can use the following config:

variables:
SNOW_ACCOUNT: DATAOPS_VAULT(SNOWFLAKE.ACCOUNT)
SNOW_USER: DATAOPS_VAULT(SNOWFLAKE.INGESTION.USERNAME)
SNOW_PASSWORD: DATAOPS_VAULT(SNOWFLAKE.INGESTION.PASSWORD)
SNOW_ROLE: DATAOPS_VAULT(SNOWFLAKE.INGESTION.ROLE)
SNOW_WAREHOUSE: DATAOPS_VAULT(SNOWFLAKE.INGESTION.WAREHOUSE)

The variables section can be defined in any job or pipeline configuration file.

DataOps Templating

DataOps Template Rendering is used to extract secrets from the DataOps Vault and inject them into configuration files such as databases.template.yml as seen below.

Jinja variables can be included in templates using the {{ ... }} syntax, and the whole vault is scoped into the variable renderer so that you can use any vault path. For example, a template can include {{ SNOWFLAKE.ACCOUNT }} which will be rendered as the configured Snowflake account string from the vault.

Additionally, the full environment is available under the prefix env., so it is possible to render an environment variable into a template, for example, {{ env.DATAOPS_DATABASE }}. Template files can include Jinja variables and other control structures, allowing a flexible and configurable method for building dynamic content.

Vault Examples

  1. The first example is a SOLE database configuration as found in databases.template.yml:
databases:
"{{ env.DATAOPS_DATABASE }}":

{# For non-production branches, this will be a clone of production #}
{% if (env.DATAOPS_ENV_NAME != 'PROD' and env.DATAOPS_ENV_NAME != 'QA') %}
from_database: "{{ env.DATAOPS_DATABASE_MASTER }}"
{% endif %}

comment: This is the main DataOps database for environment {{ env.DATAOPS_ENV_NAME }}

The first example is a SOLE database configuration as found in databases.template.yml:

This example creates a database whose name is defined by the environment variable DATAOPS_DATABASE. However, if the current environment is not PROD or QA, this database will be cloned from the production (master) database.

  1. The second example contains the SOLE configuration for multiple warehouses in warehouses.yml:
warehouses:

{% for team_name in ['FINANCE', 'OPERATIONS', 'HR', 'SALES', 'MARKETING'] %}
"{{ team_name }}":
comment: Warehouse for {{ team_name }} team usage only
warehouse_size: MEDIUM
auto_suspend: 60
auto_resume: true
grants:
USAGE:
- FUNC_{{ team_name }}_ROLE
{% endfor %}

The YAML code in this configuration file creates five identical warehouses without a lengthy, repetitive configuration, using an inline list of team names into a for loop.

Ingesting Variables from the Vault

Many simple uses of vault secrets do not require template files (see above), particularly when just passing secure values into an orchestrator, such as an access key for an API or login details for a remote system. In this case, you can create variables in the relevant job initialized from specific vault values using the DATAOPS_VAULT(...) syntax.

To use this syntax, create a variable in a job's variables' block and set the value to DATAOPS_VAULT(path.to.vault.value). When the job runs, as long as the enclosed vault path is valid, the job will replace this value with the corresponding values from the vault.

For example, this sample Talend Cloud job configuration loads the authentication token for the remote platform from the DataOps Vault:

Sample Talend Job:
...
variables:
TMC_TASK_ID: ...
TMC_ACCESS_TOKEN: DATAOPS_VAULT(TALEND.EMEA.ACCESS_TOKEN)
TMC_TASK_PARAMETERS: ...
script: /dataops
note

The variable rendering mechanism is run within each job's /dataops entry point script. As a result, values can only be used by scripts and applications that run within orchestration scripts launched by /dataops.

Password Usage

If you need to set passwords as plain text on the DataOps platform, please refer to the guide below:

note

The usage in each template for all the passwords below is:

"{{ SNOWFLAKE.INGESTION.PASSWORD }}"

DescriptionExamplePassword in Secrets
Two Double-Quotes""dAtaop3s!\"\"dAtaop3s!
Double-Quotes and a Single-Quote"'dAtop3s!\"'dAtop3s!
Two Single-Quote''DataOps2"''DataOps2"
At Sign@Dataops2"@Dataops2"
Hash/Pound#!a""Data2'#!a\"\"Data2'
Dollar SigndAt$op3s!dAt$$op3s!
Exclamation Mark!Dataops"!Dataops"
Ampersand&Dataops1"&Dataops1"
Open-Parenthesis(Dataops1(Dataops1
Close-Parenthesis)Dataops1)Dataops1
Asterisk*Dataops1"*Dataops1"
Plus-Sign+Dataops1+Dataops1
Comma,Dataops1",Dataops1"
Period.Dataops1.Dataops1
Slash/Dataops1/Dataops1
Percent Sign%Dataops1"%Dataops1"
Colon:Dataops1":Dataops1"
Semicolon;Dataops1;Dataops1
Less-than Sign<Dataops1<Dataops1
Equals Sign=Dataops1=Dataops1
Question mark?Dataops1?Dataops1
Backslash\Dataops1\\Dataops1
Square Bracket[]Dataops1"[]Dataops1"
Caret^Dataops1^Dataops1
Underscore_Dataops1_Dataops1
Tilde~Dataops1~Dataops1
Curly Brackets{}Dataops1{}Dataops1