Skip to main content

data.world Orchestrator

Enterprise

Image$DATAOPS_DATADOTWORLD_RUNNER_IMAGE
Feature Status
Feature release status badge: PubPrev
PubPrev

The data.world orchestrator interacts with the [data.world data catalog] data-world-data-catalog to publish metadata about the data transformed in a DataOps pipeline. This orchestrator provides a single-click interface to the data.world service.

Usage

pipelines/includes/local_includes/datadotworld_jobs/datadotworld.yml
data.world:
extends:
- .agent_tag
stage: Data Catalog
image: $DATAOPS_DATADOTWORLD_RUNNER_IMAGE
variables:
DW_ORG: <org name>
DW_DATASET: ddw-catalogs
DW_AUTH_TOKEN: DATAOPS_VAULT(DATADOTWORLD.DEFAULT.DW_AUTH_TOKEN)
# DW_BLOCK_PROFILE_UPLOAD: 1
script:
- /dataops
icon: ${DATADOTWORLD_ICON}

The data.world orchestrator assumes that a DataOps modeling and transformation job completed its run in an earlier stage of the DataOps pipeline. It leverages the metadata (data about data) model to provide up-to-date information to the catalog at the end of every pipeline run.

Supported parameters

ParameterRequired/DefaultDescription
DW_ORGREQUIREDThe data.world organization where the dataset fits
DW_DATASETREQUIREDThe data.world dataset to update. The standard value is ddw-catalogs.
Note: data.world may ask you to change this
DW_AUTH_TOKENREQUIREDThe data.world authentication token
DW_BLOCK_PROFILE_UPLOADOptionalIf set, it prevents updating the metadata profile during a job run
DW_META_DATA_TTL_FILEOptionalIf set, it will upload the data file using the specified name. Otherwise, the data file will be uploaded as dataops.live-catalog.
DW_META_PROFILE_TTL_FILEOptionalIf set, it will upload the meta file using the specified name. Otherwise, the meta file will be uploaded as metadata-profile.

Most of the configuration happens on the data.world application. When run, the orchestrator uploads a default profile file. The default profile is sufficient to get started. Set the DW_BLOCK_PROFILE_UPLOAD variable to prevent changes to the data.world application profile from being overwritten.

The DATA_WORLD.AUTH key in the DataOps Vault is a valid user authentication token obtained from the data.world settings at https://data.world/settings/advanced.

Example jobs

This example dynamically adjusts the organization being used based on the DataOps context (dev, test, prod). In other words, depending on the context, the default organization changes from dataopslivedev to dataopsliveqa and dataopslive, respectively.

pipelines/includes/local_includes/datadotworld_jobs/datadotworld.yml
data.world:
extends:
- .agent_tag
stage: "Data Catalog"
image: $DATAOPS_DATADOTWORLD_RUNNER_IMAGE
variables:
DW_ORG: dataopslivedev #dataopslive, dataopsliveqa,
DW_DATASET: ddw-catalogs
DW_AUTH_TOKEN: DATAOPS_VAULT(DATADOTWORLD.DEFAULT.DW_AUTH_TOKEN)
script:
- if [[ $DATAOPS_DATABASE == *"_PROD" ]]; then export DW_ORG=dataopslive; fi
- if [[ $DATAOPS_DATABASE == *"_QA" ]]; then export DW_ORG=dataopsliveqa; fi
- if [[ $DATAOPS_DATABASE == *"_DEV" ]]; then export DW_ORG=dataopslivedev; fi
- if [[ $DATAOPS_DATABASE == *"_FB_"* ]]; then export DW_ORG=dataopslivedev; fi
- echo "DATABASE = $DATAOPS_DATABASE and DW_DATASET=$DW_ORG"
- /dataops
icon: ${DATADOTWORLD_ICON}

Project resources

The data.world orchestrator assumes that MATE has already run in the pipeline. It then leverages the MATE results, specifically table-level lineage, including tags, descriptions, and other metadata.

The orchestrator uses two intermediate files, the catalog and manifest. The files must be located at the following path: /dataops/modelling/target; a working directory of the standard MATE project found at /dataops/modelling/.

The details of these intermediate files are as follows:

  • catalog.json - this file contains information from your data warehouse about the tables and views produced and defined by the resources in your project.

  • manifest.json - this file contains a complete representation of your dbt project's resources (models, tests, macros, etc.), including all node configurations and resource properties.

Host dependencies (and Resources)

The example configurations use a data.world access token stored in the DataOps vault at DATADOTWORLD.DEFAULT.DW_AUTH_TOKEN.