Skip to main content

How to Support Wildcards in YAML Files

For more flexibility in creating DataOps pipelines, we support using wildcards in the <pipeline-name>-ci.yml file's include:, resulting in the ability to break your <pipeline-name>-ci.yml files into multiple smaller files to improve reusability and readability. It is now simpler to reuse the same configuration files in different places and make pipeline config files less verbose, more dynamic, simpler, and easier to read.

use with care

These wildcards must be used with care especially when the runtime order of the YAML files matter.

For instance, as the following full-ci.yml pipeline file example shows, multiple files are frequently included in a single pipeline file.

full-ci.yml
include:
# This loads a set of variables used in this pipeline
- project: "reference-template-projects/dataops-template/dataops-reference"
ref: 5-stable
file: "/pipelines/includes/base_bootstrap.yml"

- "/pipelines/includes/overrides/stages.yml"

# Includes all the required config and default files
- "/pipelines/includes/bootstrap.yml"

## ENTERPRISE SECRETS MANAGER INTEGRATION
- "/pipelines/includes/local_includes/secrets_management/aws_secrets_manager_load.yml"

## SNOWFLAKE OBJECT LIFECYCLE ENGINE
- "/pipelines/includes/local_includes/snowflake_object_lifecycle_jobs/snowflake_lifecycle_aggregate.yml"
- "/pipelines/includes/local_includes/governance_demos/setup.yml"
- "/pipelines/includes/local_includes/snowflake_object_lifecycle_jobs/snowflake-udf-deploy.yml"

## INGESTION ELT/ETL
- "/pipelines/includes/local_includes/modelling_and_transformation_jobs/modelling_and_transformations-salesandhr.yml"
- "/pipelines/includes/local_includes/sodasql_jobs/soda_testing.yml"
- "/pipelines/includes/local_includes/data_vault/data_vault_modelling.yml"
- "/pipelines/includes/local_includes/data_vault/data_vault_testing.yml"

## DOCUMENTATION AND DATA CATALOGING
- "/pipelines/includes/local_includes/datadotworld_jobs/datadotworld.yml"
- "/pipelines/includes/local_includes/reporter_jobs/role_explorer.yml"
- "/pipelines/includes/local_includes/modelling_and_transformation_jobs/meta_updates.yml"

# Includes all CSV ingestion files
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/1m_row_electricity_usage.yml"
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/airline_safety.yml"
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/Caret_Delim-Enc_iso8859_15.yml"
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/Semicolon_Delim-Quote_Num.yml"
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/Tab_Delim-Quote_All.yml"

# Includes all MS-SQL ingestion files
- "/pipelines/includes/local_includes/mssql_ingestion_jobs/mssql-person.yml"
- "/pipelines/includes/local_includes/mssql_ingestion_jobs/mssql-sales.yml"
- "/pipelines/includes/local_includes/mssql_ingestion_jobs/mssql-sales-incremental.yml"

# Includes all PostgreSQL ingestion files
- "/pipelines/includes/local_includes/postgresql_ingestion_jobs/postgresql-adventureworks.yml"

# Includes all DataPrep/Sync files
- "pipelines/includes/local_includes/dataprep_jobs/aws-aws.yml"
- "pipelines/includes/local_includes/dataprep_jobs/aws-azure.yml"
- "pipelines/includes/local_includes/dataprep_jobs/aws_zip-local.yml"

This code example shows how to simplify part of this full-ci.yml file by using the * wildcard in the sections where multiple files are stored in the same directory.

full-ci.yml
include:
# This loads a set of variables used in this pipeline
- project: "reference-template-projects/dataops-template/dataops-reference"
ref: 5-stable
file: "/pipelines/includes/base_bootstrap.yml"

- "/pipelines/includes/overrides/stages.yml"

# Includes all the required config and default files
- "/pipelines/includes/bootstrap.yml"

## ENTERPRISE SECRETS MANAGER INTEGRATION
- "/pipelines/includes/local_includes/secrets_management/aws_secrets_manager_load.yml"

## SNOWFLAKE OBJECT LIFECYCLE ENGINE
- "/pipelines/includes/local_includes/snowflake_object_lifecycle_jobs/snowflake_lifecycle_aggregate.yml"
- "/pipelines/includes/local_includes/governance_demos/setup.yml"
- "/pipelines/includes/local_includes/snowflake_object_lifecycle_jobs/snowflake-udf-deploy.yml"

## INGESTION ELT/ETL
- "/pipelines/includes/local_includes/modelling_and_transformation_jobs/modelling_and_transformations-salesandhr.yml"
- "/pipelines/includes/local_includes/sodasql_jobs/soda_testing.yml"
- "/pipelines/includes/local_includes/data_vault/data_vault_modelling.yml"
- "/pipelines/includes/local_includes/data_vault/data_vault_testing.yml"

## DOCUMENTATION AND DATA CATALOGING
- "/pipelines/includes/local_includes/datadotworld_jobs/datadotworld.yml"
- "/pipelines/includes/local_includes/reporter_jobs/role_explorer.yml"
- "/pipelines/includes/local_includes/modelling_and_transformation_jobs/meta_updates.yml"

# Includes all CSV ingestion files
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/*.yml"

# Includes all MS-SQL ingestion files
- "/pipelines/includes/local_includes/mssql_ingestion_jobs/*.yml"

# Includes all PostgreSQL ingestion files
- "/pipelines/includes/local_includes/postgresql_ingestion_jobs/postgresql-adventureworks.yml"

# Includes all DataPrep/Sync files
- "pipelines/includes/local_includes/dataprep_jobs/*.yml"

Lastly, as noted at the top of this doc, it is imperative to use this feature with care, especially if the order in which the files are included is critical.

For instance, if we consider the CSV ingestion file section in the full-ci.yml file (see below for code snippet), without the * wildcard, the jobs are included in the order they are listed. Since job definitions are generally unique it is safe to replace them with wildcards.

full-ci.yml
include:
...
# Includes all CSV ingestion files
- '/pipelines/includes/local_includes/csv_file_ingestion_jobs/1m_row_electricity_usage.yml'
- '/pipelines/includes/local_includes/csv_file_ingestion_jobs/airline_safety.yml'
- '/pipelines/includes/local_includes/csv_file_ingestion_jobs/Caret_Delim-Enc_iso8859_15.yml'
- '/pipelines/includes/local_includes/csv_file_ingestion_jobs/Semicolon_Delim-Quote_Num.yml'
- '/pipelines/includes/local_includes/csv_file_ingestion_jobs/Tab_Delim-Quote_All.yml'
...

If you would try to wildcard include though all files from /pipelines/includes/**.*yml the behavior is undefined which variable will be used.