data.world Orchestrator
Type | Pre-Set |
---|---|
Image | $DATAOPS_DATADOTWORLD_RUNNER_IMAGE |
Feature Status | PubPrev |
The data.world orchestrator is a pre-set orchestrator that interacts with the data.world data catalog to publish metadata about the data transformed in a DataOps pipeline. In summary, the data.world Orchestrator provides a single-click interface to the data.world service.
Usage
data.world:
extends:
- .agent_tag
stage: Data Catalog
image: $DATAOPS_DATADOTWORLD_RUNNER_IMAGE
variables:
DW_ORG: <org name>
DW_DATASET: ddw-catalogs
DW_AUTH_TOKEN: DATAOPS_VAULT(DATADOTWORLD.DEFAULT.DW_AUTH_TOKEN)
# DW_BLOCK_PROFILE_UPLOAD: 1
script:
- /dataops
icon: ${DATADOTWORLD_ICON}
The data.world orchestrator assumes that a DataOps modeling and transformation job completed its run in an earlier stage of the DataOps pipeline. It leverages the metadata (data about data) model to provide up-to-date information to the catalog at the end of every pipeline run.
Supported parameters
Parameter | Required/Default | Description |
---|---|---|
DW_ORG | REQUIRED | The data.world organization where the dataset fits |
DW_DATASET | REQUIRED | The data.world dataset to update. The standard value is ddw-catalogs .Note: data.world may ask you to change this |
DW_AUTH_TOKEN | REQUIRED | The data.world authentication token |
DW_BLOCK_PROFILE_UPLOAD | Optional | If set, it prevents updating the metadata profile during a job run |
DW_META_DATA_TTL_FILE | Optional | If set, it will upload the data file using the specified name. Otherwise the data file will be uploaded as dataopslive-catalog. |
DW_META_PROFILE_TTL_FILE | Optional | If set, it will upload the meta file using the specified name. Otherwise the meta file will be uploaded as metadata-profile. |
Most of the configuration happens on the data.world application. When run, the orchestrator uploads a default profile file. The default profile is sufficient to get started. Set the DW_BLOCK_PROFILE_UPLOAD
variable to prevent changes to the data.world application profile from being overwritten.
The DATA_WORLD.AUTH
key in the DataOps Vault is a valid user authentication token obtained from the data.world settings at https://data.world/settings/advanced.
Example jobs
This example dynamically adjusts the organization being used based on the DataOps context (dev, test, prod). In other words, depending on the context, the default organization changes from dataopslivedev
to dataopsliveqa
and dataopslive
, respectively.
data.world:
extends:
- .agent_tag
stage: "Data Catalog"
image: $DATAOPS_DATADOTWORLD_RUNNER_IMAGE
variables:
DW_ORG: dataopslivedev #dataopslive, dataopsliveqa,
DW_DATASET: ddw-catalogs
DW_AUTH_TOKEN: DATAOPS_VAULT(DATADOTWORLD.DEFAULT.DW_AUTH_TOKEN)
script:
- if [[ $DATAOPS_DATABASE == *"_PROD" ]]; then export DW_ORG=dataopslive; fi
- if [[ $DATAOPS_DATABASE == *"_QA" ]]; then export DW_ORG=dataopsliveqa; fi
- if [[ $DATAOPS_DATABASE == *"_DEV" ]]; then export DW_ORG=dataopslivedev; fi
- if [[ $DATAOPS_DATABASE == *"_FB_"* ]]; then export DW_ORG=dataopslivedev; fi
- echo "DATABASE = $DATAOPS_DATABASE and DW_DATASET=$DW_ORG"
- /dataops
icon: ${DATADOTWORLD_ICON}
Project resources
The data.world orchestrator assumes that MATE has already run in the pipeline. It then leverages the MATE results, specifically table-level lineage, including tags, descriptions, and other metadata.
The orchestrator uses two intermediate files, the catalog and manifest. The files must be located at the following path: /dataops/modelling/target
; a working directory of the standard MATE project found at /dataops/modelling/
.
The details of these intermediate files are as follows:
catalog.json
- this file contains information from your data warehouse about the tables and views produced and defined by the resources in your project.manifest.json
- this file contains a complete representation of your dbt project's resources (models, tests, macros, etc.), including all node configurations and resource properties.
Host dependencies (and Resources)
The example configurations use a data.world access token stored in the DataOps vault at DATADOTWORLD.DEFAULT.DW_AUTH_TOKEN
.