data.world catalog Orchestrator
Enterprise
| Image | $DATAOPS_DATADOTWORLD_CATALOG_RUNNER_IMAGE | 
|---|
The data.world catalog orchestrator interacts with the data.world data catalog to publish metadata about the data transformed in a DataOps pipeline. In summary, the data.world catalog orchestrator provides a single-click interface to the data.world service.
Usage
data.world_v2:
  extends:
    - .agent_tag
  stage: Data Catalog
  image: $DATAOPS_DATADOTWORLD_CATALOG_RUNNER_IMAGE
  variables:
    DATADOTWORLD_ACTION: START
    DW_ORG: <org name>
    DW_INSTANCE: <Private instance name>
    DW_DBT_PROJECT_NAME: <dbt_collection_name>
    DW_SNOWFLAKE_PROJECT_NAME: <snowflake_collection_name>
    DW_DBT_DATASET: <dbt-dataset>
    DW_SNOWFLAKE_DATASET: <snowflake-dataset>
    DW_AUTH_TOKEN: DATAOPS_VAULT(DATADOTWORLD.DEFAULT.DW_AUTH_TOKEN)
    DW_UPLOAD: "true"
    DATAOPS_DDW_SNOWFLAKE_URL: DATAOPS_VAULT(SNOWFLAKE.ACCOUNT).snowflakecomputing.com
    DATAOPS_DDW_DATABASE: ${DATAOPS_DATABASE}
    DATAOPS_DDW_WAREHOUSE: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.WAREHOUSE)
    DATAOPS_DDW_USER: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.USERNAME)
    DATAOPS_DDW_PASSWORD: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.PASSWORD)
    DATAOPS_DDW_ROLE: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.ROLE)
    # DW_BLOCK_PROFILE_UPLOAD: 1
    DATAOPS_TEMPLATES_DIR: $CI_PROJECT_DIR/dataops/profiles
    DATAOPS_SECONDARY_TEMPLATES_DIR: $CI_PROJECT_DIR/dataops/modelling
  script:
    - /dataops
  icon: ${DATADOTWORLD_ICON}
The data.world catalog orchestrator assumes that a DataOps modeling and transformation job completed its run earlier in the DataOps pipeline. It leverages the metadata (data about data) model to provide up-to-date information to the catalog at the end of every pipeline run.
Supported parameters
| Parameter | Required/Default | Description | 
|---|---|---|
DATADOTWORLD_ACTION | REQUIRED | Action to be performed by the orchestrator. Must be START | 
DW_ORG | REQUIRED | The data.world organization where the dataset fits | 
DW_INSTANCE | Optional (required for private instance) | Should be set only for data.world private instance | 
DW_SNOWFLAKE_DATASET | REQUIRED | The data.world Snowflake dataset to update. The standard value is ddw-catalogs.Note: data.world may ask you to change this  | 
DW_DBT_DATASET | REQUIRED | The data.world dbt dataset to update. The standard value is ddw-catalogs.Note: data.world may ask you to change this  | 
DW_AUTH_TOKEN | REQUIRED | An authentication token in data.world is a secure and unique identifier that allows you to access and authenticate with data.world's services and APIs. With this token, you can perform actions, including uploading data, querying datasets, and interacting with the platform programmatically. | 
DW_DBT_PROJECT_NAME | REQUIRED | Specifies the name of the dbt collection where the collector output will be stored | 
DW_SNOWFLAKE_PROJECT_NAME | REQUIRED | Specifies the name of the snowflake collection where the collector output will be stored | 
DATAOPS_DDW_SNOWFLAKE_URL | REQUIRED | Snowflake URL used to connect to data.world | 
DATAOPS_DDW_DATABASE | REQUIRED | Snowflake database used to connect to | 
DATAOPS_DDW_USER | REQUIRED | Snowflake username used to connect to database | 
DATAOPS_DDW_ROLE | REQUIRED | Snowflake role used to run the query. Note: DATAOPS_DDW_ROLE should have access to SNOWFLAKE.ACCOUNT_USAGE database/schema if DW_TAG_COLLECTION and DW_POLICY_COLLECTION are set to true. | 
DATAOPS_DDW_PASSWORD | Optional | Snowflake password used to connect to database | 
DATAOPS_SNOWFLAKE_AUTH | Optional (required for key-pair authentication) | Authentication method used to connect to Snowflake. See key-pair authentication on how to use it. | 
DATAOPS_DDW_WAREHOUSE | Optional | Snowflake warehouse used to connect to data.world | 
DW_UPLOAD | Optional | If set, it uploads the generated catalog to the organization account's catalogs dataset | 
DW_BLOCK_PROFILE_UPLOAD | Optional | If set, it prevents updating the metadata profile during a job run | 
DATAOPS_TEMPLATES_DIR | REQUIRED | The directory where you place your query templates. The recommended setting is $CI_PROJECT_DIR/dataops/profiles | 
DATAOPS_SECONDARY_TEMPLATES_DIR | REQUIRED | The secondary directory where you place your query templates. The recommended setting is $CI_PROJECT_DIR/dataops/modelling | 
DW_TAG_COLLECTION | Optional - defaults to true | If set, it will harvest tags in snowflake collector | 
DW_POLICY_COLLECTION | Optional - defaults to true | If set, it will harvest masking policies and row access policies in snowflake collector | 
DW_LOG_LEVEL | Optional - defaults to INFO | Specify the logging level as a string, choosing from INFO, WARN, ERROR, or DEBUG | 
Make sure the service user associated with the organization has Manage access.
Most of the configuration happens on the data.world application. When run, the orchestrator uploads a default profile file. The default profile is sufficient to get started.
The DATA_WORLD.AUTH key in the DataOps vault is a valid user authentication token obtained from the data.world
settings at https://data.world/settings/advanced.
Authentication
Key-pair authentication
The data.world catalog orchestrator supports using Snowflake key-pair authentication. To learn how to configure it, see the key-pair authentication documentation.
Example jobs
This example dynamically adjusts the organization being used based on the DataOps context (dev, test, prod). In other
words, depending on the context, the default organization changes from dataopslivedev to dataopsliveqa and
dataopslive, respectively.
- Password based authentication
 - Key-pair based authentication
 
Publish metadata and lineage:
  extends:
    - .agent_tag
  stage: Data Catalog
  image: $DATAOPS_DATADOTWORLD_CATALOG_RUNNER_IMAGE
  variables:
    DATADOTWORLD_ACTION: START
    DW_ORG: dataopslive
    DW_INSTANCE: data.world
    DW_DBT_PROJECT_NAME: data-world-catalog
    DW_SNOWFLAKE_PROJECT_NAME: data-world-catalog
    DW_SNOWFLAKE_DATASET: ddw-staging
    DW_DBT_DATASET: ddw-staging
    DW_AUTH_TOKEN: DATAOPS_VAULT(DATADOTWORLD.DEFAULT.DW_AUTH_TOKEN)
    DW_UPLOAD: "true"
    DATAOPS_DDW_SNOWFLAKE_URL: DATAOPS_VAULT(SNOWFLAKE.ACCOUNT).snowflakecomputing.com
    DATAOPS_DDW_DATABASE: ${DATAOPS_DATABASE}
    DATAOPS_DDW_WAREHOUSE: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.WAREHOUSE)
    DATAOPS_DDW_ROLE: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.ROLE)
    DATAOPS_DDW_USER: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.USERNAME)
    DATAOPS_DDW_PASSWORD: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.PASSWORD)
    DATAOPS_TEMPLATES_DIR: $CI_PROJECT_DIR/dataops/profiles
    DATAOPS_SECONDARY_TEMPLATES_DIR: $CI_PROJECT_DIR/dataops/modelling
    DW_TAG_COLLECTION: "false"
    DW_POLICY_COLLECTION: "false"
    DW_LOG_LEVEL: DEBUG
  script:
    - /dataops
  icon: ${DATADOTWORLD_ICON}
Publish metadata and lineage:
  extends:
    - .agent_tag
  stage: Data Catalog
  image: $DATAOPS_DATADOTWORLD_CATALOG_RUNNER_IMAGE
  variables:
    DATADOTWORLD_ACTION: START
    DW_ORG: dataopslive
    DW_INSTANCE: data.world
    DW_DBT_PROJECT_NAME: data-world-catalog
    DW_SNOWFLAKE_PROJECT_NAME: data-world-catalog
    DW_SNOWFLAKE_DATASET: ddw-staging
    DW_DBT_DATASET: ddw-staging
    DW_AUTH_TOKEN: DATAOPS_VAULT(DATADOTWORLD.DEFAULT.DW_AUTH_TOKEN)
    DW_UPLOAD: "true"
    DATAOPS_DDW_SNOWFLAKE_URL: DATAOPS_VAULT(SNOWFLAKE.ACCOUNT).snowflakecomputing.com
    DATAOPS_DDW_DATABASE: ${DATAOPS_DATABASE}
    DATAOPS_DDW_WAREHOUSE: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.WAREHOUSE)
    DATAOPS_DDW_ROLE: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.ROLE)
    DATAOPS_DDW_USER: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.USERNAME)
    DATAOPS_SNOWFLAKE_AUTH: KEY_PAIR
    DATAOPS_SNOWFLAKE_KEY_PAIR: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.KEY_PAIR)
    DATAOPS_SNOWFLAKE_PASSPHRASE: DATAOPS_VAULT(SNOWFLAKE.TRANSFORM.PASSPHRASE)
    DATAOPS_TEMPLATES_DIR: $CI_PROJECT_DIR/dataops/profiles
    DATAOPS_SECONDARY_TEMPLATES_DIR: $CI_PROJECT_DIR/dataops/modelling
    DW_TAG_COLLECTION: "false"
    DW_POLICY_COLLECTION: "false"
    DW_LOG_LEVEL: DEBUG
  script:
    - /dataops
  icon: ${DATADOTWORLD_ICON}
Project resources
The data.world catalog orchestrator assumes that MATE has already run in the pipeline. It then leverages the MATE results, specifically table-level lineage, including tags, descriptions, and other metadata.
The orchestrator uses four intermediate files: the catalog, manifest, dbt_project, and run_results. The files must be
located at the following path: /dataops/modelling/target; a working directory of the standard MATE project found at
/dataops/modelling/.
The details of these intermediate files are as follows:
- 
catalog.json- this file contains information from your data warehouse about the tables and views produced and defined by the resources in your project. - 
manifest.json- this file contains a complete representation of your dbt project's resources (models, tests, macros, etc.), including all node configurations and resource properties. - 
dbt_project.yml- Every dbt project needs a dbt_project.yml file — this is how dbt knows a directory is a dbt project. It also contains important information that tells dbt how to operate on your project. - 
run_results.json- this file contains information about a completed invocation of dbt, including timing and status info for each node (model, test, etc.) that was executed. 
Host dependencies (and Resources)
The example configurations use a data.world access token stored in the DataOps vault at
DATADOTWORLD.DEFAULT.DW_AUTH_TOKEN.