Skip to main content

VaultSpeed Orchestrator

TypePre-Set
Image$DATAOPS_VAULTSPEED_RUNNER_IMAGE

The pre-set VaultSpeed orchestrator deploys your pre-configured VaultSpeed project into a Snowflake database.

Prerequisites

Usage

The VaultSpeed orchestrator deploys the data vault you have designed in VaultSpeed into your database, where you can iterate on ingesting your source data and set up a schedule to execute Data Vault and Business Vault as you have set them up in your VaultSpeed project.

You can use SOLE to generate the database where your VaultSpeed data vault would be deployed. You can also use MATE to pick up after your Business Vault has been completed to build additional models and tables on top of your Business Vault, adhering at the same time to all DataOps.live principles and values.

Let's look at the code of a standard VaultSpeed pipeline as found in the DataOps Reference Project:

Configure VaultSpeed Manager:
extends:
- .agent_tag
stage: Additional Configuration
image: $DATAOPS_VAULTSPEED_RUNNER_IMAGE
script:
- /dataops
variables:
DATAOPS_VAULTSPEED_ACTION: GENERATE_CODE
DATAOPS_VAULTSPEED_GENERATION_TYPE: SNOWFLAKEDBT
DATAOPS_VAULTSPEED_PROJECT: my_vaultspeed_project_name
DATAOPS_VAULTSPEED_URL: DATAOPS_VAULT(VAULTSPEED.URL)
DATAOPS_VAULTSPEED_USER: DATAOPS_VAULT(VAULTSPEED.USER)
DATAOPS_VAULTSPEED_PASSWORD: DATAOPS_VAULT(VAULTSPEED.PASSWORD)
DATAOPS_VAULTSPEED_OUTPUT: vaultspeed/target
DATAOPS_SNOWFLAKE_ACCOUNT: DATAOPS_VAULT(SNOWFLAKE.SOLE.ACCOUNT)
DATAOPS_SNOWFLAKE_USER: DATAOPS_VAULT(SNOWFLAKE.SOLE.USERNAME)
DATAOPS_SNOWFLAKE_PASSWORD: DATAOPS_VAULT(SNOWFLAKE.SOLE.PASSWORD)
DATAOPS_SNOWFLAKE_ROLE: DATAOPS_VAULT(SNOWFLAKE.SOLE.ROLE)
DATAOPS_SNOWFLAKE_WAREHOUSE: DATAOPS_VAULT(SNOWFLAKE.SOLE.WAREHOUSE)
DATAOPS_SNOWFLAKE_DATABASE: ${DATAOPS_DATABASE} # The name of the database depends on your environment
artifacts:
paths:
- fl_flow_config.yml
- bv_flow_config.yml
- ${CI_PROJECT_DIR}/scripts
- ${CI_PROJECT_DIR}/${DATAOPS_VAULTSPEED_OUTPUT}

FL Flow:
stage: Raw Transformation
inherit:
variables: false
trigger:
strategy: depend
include:
- artifact: fl_flow_config.yml
job: Configure VaultSpeed Manager
variables:
DATAOPS_VAULTSPEED_ACTION: TRIGGER_FMC
DATAOPS_SNOWFLAKE_ACCOUNT: DATAOPS_VAULT(SNOWFLAKE.SOLE.ACCOUNT)
DATAOPS_SNOWFLAKE_USER: DATAOPS_VAULT(SNOWFLAKE.SOLE.USERNAME)
DATAOPS_SNOWFLAKE_PASSWORD: DATAOPS_VAULT(SNOWFLAKE.SOLE.PASSWORD)

BV Flow:
stage: Business Transformation
inherit:
variables: false
trigger:
strategy: depend
include:
- artifact: bv_flow_config.yml
job: Configure VaultSpeed Manager
variables:
DATAOPS_VAULTSPEED_ACTION: TRIGGER_FMC
DATAOPS_SNOWFLAKE_ACCOUNT: DATAOPS_VAULT(SNOWFLAKE.SOLE.ACCOUNT)
DATAOPS_SNOWFLAKE_USER: DATAOPS_VAULT(SNOWFLAKE.SOLE.USERNAME)
DATAOPS_SNOWFLAKE_PASSWORD: DATAOPS_VAULT(SNOWFLAKE.SOLE.PASSWORD)

The above example pipeline defines the orchestrator's logic, stages, and behaviors, respecting a given order:

  1. At the stage Additional Configuration, you check the state of the orchestrator to determine if this is the first or consecutive run. The state can be either INIT for the initial run or INCR for the incremental run. The state is stored in the persistent cache of the runner. You can reset the state by passing a variable or deleting the cache from the runner.

    • INIT: If the state is INIT, when you have initially run the orchestrator or explicitly requested this, you execute all the necessary DDL (Data Definition Language), schemas, tables, procedures, etc., for the VaultSpeed project.
    • INCR: If the state is INCR, you skip DDL execution and continue to the FMC (Flow Management Control) execution.
  2. At the stage Raw Transformation, you execute the Foundation layer FMC logic for each source. So as many sources as you have, you dynamically generate them into fl_flow_config.yml and pass them as artifacts to trigger a child pipeline for each source. The more sources you have, the more child pipelines you trigger.

    Again, there are two sets of behaviors here:

    • INIT: Triggers child pipeline executions regardless of any set schedule.
    • INCR: Triggers child pipelines based on a pre-determined schedule set in your VaultSpeed project. The value can either be a string, i.e., "15 minutes", "1 hour 20 minutes" or a valid cron expression. Based on each source schedule, some child pipelines are triggered and executed while others pass.

    Here is an example of having two sources, source A, set to "update" every 15 minutes (schedule_interval: "15 minutes") and a second one, source B, set to "update" every 60 minutes (schedule_interval: "1 hour"). If you set your pipeline on a schedule of 15 minutes with that in mind, for four consecutive runs of the pipeline schedule, you would have run source A 4 times and source B just once. It is recommended to have the slightest time delta equal to your most occurring source update, so the "15 minutes" in this case.

  3. At the stage Business Transformation, you execute the Business Vault layer of models. This always comes after completing all the jobs from the Raw Transformation stage. The same logic is followed:

    • INIT: Triggers child pipelines execution regardless of any set schedule and only if all the jobs from the Raw Transformation stage are successful.
    • INCR: Similar to the Raw Transformation INCR logic, this triggers the FMC Business Vault run based on the scheduled interval and only if the previous stage is completed successfully. Again, some child pipelines could be skipped due to the schedule, even if the previous stage was successful.

Here is an example of how the pipeline nodes look like and what they mean:

Exemplar-pipeline-jobs !!shadow!! In the image above, you have two executing sources, and the type of run is INCR. You can "unfold" child pipeline nodes on the right to see sub-jobs being run and read logs.

Supported parameters

ParameterRequired/DefaultDescription
DATAOPS_VAULTSPEED_URLREQUIREDPulled from the Vault — value is usually https://vaultspeed.com/api
DATAOPS_VAULTSPEED_USERREQUIREDPulled from the Vault — value is your login email for VaultSpeed
DATAOPS_VAULTSPEED_PASSWORDREQUIREDPulled from the Vault — value is your login password for VaultSpeed
DATAOPS_VAULTSPEED_PROJECTREQUIREDYour VaultSpeed project name
DATAOPS_SNOWFLAKE_ACCOUNTREQUIREDYour snowflake account
DATAOPS_SNOWFLAKE_USERREQUIREDYour snowflake user
DATAOPS_SNOWFLAKE_PASSWORDREQUIREDYour snowflake password
DATAOPS_SNOWFLAKE_ROLEREQUIREDYour snowflake role
DATAOPS_SNOWFLAKE_WAREHOUSEREQUIREDYour snowflake warehouse
DATAOPS_SNOWFLAKE_DATABASEOptional - defaults to DATAOPS_DATABASEThe snowflake database where your data vault resides
DATAOPS_VAULTSPEED_ACTIONOptional - defaults to GENERATE_CODEIs part of the logic setup to generate child pipelines
DATAOPS_VAULTSPEED_DATA_VAULT_NAMEOptionalYour dv_name from VaultSpeed project in case you have several
DATAOPS_VAULTSPEED_DATA_VAULT_RELEASEOptionalThe name of your Data Vault release in VaultSpeed - the last release is used by default
DATAOPS_VAULTSPEED_BUSINESS_VAULT_RELEASEOptionalThe name of your Business Vault release in VaultSpeed - the last release is used by default.
Caution: You can use this parameter only if you set the corresponding DATAOPS_VAULTSPEED_DATA_VAULT_RELEASE.
DATAOPS_VAULTSPEED_GENERATION_TYPEOptionalCan be omitted — currently only supported value is SNOWFLAKEDBT
DATAOPS_VAULTSPEED_FORCE_GENERATIONOptionalForce the generation of new code from Vaultspeed if any changes you have done are not coming up. This forces the INIT run.
DATAOPS_VAULTSPEED_TYPE_OF_RUNOptionalManually set up run to be INIT or INCR
DATAOPS_VAULTSPEED_SKIP_SCHEDULE_CONTROLLEROptionalIf set, the orchestrator will ignore the schedule interval set in VaultSpeed and always run the FMC flows