How to Support Wildcards in YAML Files
For more flexibility in creating DataOps pipelines, we support using wildcards in the <pipeline-name>-ci.yml
file's include:
, resulting in the ability to break your <pipeline-name>-ci.yml
files into multiple smaller files to improve reusability and readability. It is now simpler to reuse the same configuration files in different places and make pipeline config files less verbose, more dynamic, simpler, and easier to read.
These wildcards must be used with care especially when the runtime order of the YAML files matter.
For instance, as the following full-ci.yml
pipeline file example shows, multiple files are frequently included in a single pipeline file.
include:
# This loads a set of variables used in this pipeline
- project: "reference-template-projects/dataops-template/dataops-reference"
ref: 5-stable
file: "/pipelines/includes/base_bootstrap.yml"
- "/pipelines/includes/overrides/stages.yml"
# Includes all the required config and default files
- "/pipelines/includes/bootstrap.yml"
## ENTERPRISE SECRETS MANAGER INTEGRATION
- "/pipelines/includes/local_includes/secrets_management/aws_secrets_manager_load.yml"
## SNOWFLAKE OBJECT LIFECYCLE ENGINE
- "/pipelines/includes/local_includes/snowflake_object_lifecycle_jobs/snowflake_lifecycle_aggregate.yml"
- "/pipelines/includes/local_includes/governance_demos/setup.yml"
- "/pipelines/includes/local_includes/snowflake_object_lifecycle_jobs/snowflake-udf-deploy.yml"
## INGESTION ELT/ETL
- "/pipelines/includes/local_includes/modelling_and_transformation_jobs/modelling_and_transformations-salesandhr.yml"
- "/pipelines/includes/local_includes/sodasql_jobs/soda_testing.yml"
- "/pipelines/includes/local_includes/data_vault/data_vault_modelling.yml"
- "/pipelines/includes/local_includes/data_vault/data_vault_testing.yml"
## DOCUMENTATION AND DATA CATALOGING
- "/pipelines/includes/local_includes/datadotworld_jobs/datadotworld.yml"
- "/pipelines/includes/local_includes/reporter_jobs/role_explorer.yml"
- "/pipelines/includes/local_includes/modelling_and_transformation_jobs/meta_updates.yml"
# Includes all CSV ingestion files
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/1m_row_electricity_usage.yml"
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/airline_safety.yml"
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/Caret_Delim-Enc_iso8859_15.yml"
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/Semicolon_Delim-Quote_Num.yml"
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/Tab_Delim-Quote_All.yml"
# Includes all MS-SQL ingestion files
- "/pipelines/includes/local_includes/mssql_ingestion_jobs/mssql-person.yml"
- "/pipelines/includes/local_includes/mssql_ingestion_jobs/mssql-sales.yml"
- "/pipelines/includes/local_includes/mssql_ingestion_jobs/mssql-sales-incremental.yml"
# Includes all PostgreSQL ingestion files
- "/pipelines/includes/local_includes/postgresql_ingestion_jobs/postgresql-adventureworks.yml"
# Includes all DataPrep/Sync files
- "pipelines/includes/local_includes/dataprep_jobs/aws-aws.yml"
- "pipelines/includes/local_includes/dataprep_jobs/aws-azure.yml"
- "pipelines/includes/local_includes/dataprep_jobs/aws_zip-local.yml"
This code example shows how to simplify part of this full-ci.yml
file by using the *
wildcard in the sections where multiple files are stored in the same directory.
include:
# This loads a set of variables used in this pipeline
- project: "reference-template-projects/dataops-template/dataops-reference"
ref: 5-stable
file: "/pipelines/includes/base_bootstrap.yml"
- "/pipelines/includes/overrides/stages.yml"
# Includes all the required config and default files
- "/pipelines/includes/bootstrap.yml"
## ENTERPRISE SECRETS MANAGER INTEGRATION
- "/pipelines/includes/local_includes/secrets_management/aws_secrets_manager_load.yml"
## SNOWFLAKE OBJECT LIFECYCLE ENGINE
- "/pipelines/includes/local_includes/snowflake_object_lifecycle_jobs/snowflake_lifecycle_aggregate.yml"
- "/pipelines/includes/local_includes/governance_demos/setup.yml"
- "/pipelines/includes/local_includes/snowflake_object_lifecycle_jobs/snowflake-udf-deploy.yml"
## INGESTION ELT/ETL
- "/pipelines/includes/local_includes/modelling_and_transformation_jobs/modelling_and_transformations-salesandhr.yml"
- "/pipelines/includes/local_includes/sodasql_jobs/soda_testing.yml"
- "/pipelines/includes/local_includes/data_vault/data_vault_modelling.yml"
- "/pipelines/includes/local_includes/data_vault/data_vault_testing.yml"
## DOCUMENTATION AND DATA CATALOGING
- "/pipelines/includes/local_includes/datadotworld_jobs/datadotworld.yml"
- "/pipelines/includes/local_includes/reporter_jobs/role_explorer.yml"
- "/pipelines/includes/local_includes/modelling_and_transformation_jobs/meta_updates.yml"
# Includes all CSV ingestion files
- "/pipelines/includes/local_includes/csv_file_ingestion_jobs/*.yml"
# Includes all MS-SQL ingestion files
- "/pipelines/includes/local_includes/mssql_ingestion_jobs/*.yml"
# Includes all PostgreSQL ingestion files
- "/pipelines/includes/local_includes/postgresql_ingestion_jobs/postgresql-adventureworks.yml"
# Includes all DataPrep/Sync files
- "pipelines/includes/local_includes/dataprep_jobs/*.yml"
Lastly, as noted at the top of this doc, it is imperative to use this feature with care, especially if the order in which the files are included is critical.
For instance, if we consider the CSV ingestion file section in the full-ci.yml
file (see below for code snippet), without the *
wildcard, the jobs are included in the order they are listed. Since job definitions are generally unique it is safe to replace them with wildcards.
include:
...
# Includes all CSV ingestion files
- '/pipelines/includes/local_includes/csv_file_ingestion_jobs/1m_row_electricity_usage.yml'
- '/pipelines/includes/local_includes/csv_file_ingestion_jobs/airline_safety.yml'
- '/pipelines/includes/local_includes/csv_file_ingestion_jobs/Caret_Delim-Enc_iso8859_15.yml'
- '/pipelines/includes/local_includes/csv_file_ingestion_jobs/Semicolon_Delim-Quote_Num.yml'
- '/pipelines/includes/local_includes/csv_file_ingestion_jobs/Tab_Delim-Quote_All.yml'
...
If you would try to wildcard include though all files from /pipelines/includes/**.*yml
the behavior is undefined which variable will be used.