Skip to main content

Cheat Sheet

Overview

The data product platform is extremely powerful but requires a bit of knowledge. Like most powerful systems, the things you use daily will become second nature. This cheat sheet is for everything else.

Conventions

By convention, all VARIABLES, ENVIRONMENT_VARIABLES, or PLACE_HOLDERS in templates or pipeline definition files are in upper case. This naming convention makes them identifiable compared to most other text or configurations, which is usually lower case e.g.

Run stack sales sources:
extends: .modelling_and_transformation_base
variables:
TRANSFORM_ACTION: RUN
TRANSFORM_MODEL_SELECTOR: "tag:source_stack_sales"

or

dbname: "{{ DATABASE }}" # Snowflake database name
user: "{{ SNOWFLAKE_USERNAME }}" # Snowflake user
password: "{{ SNOWFLAKE_PASSWORD }}" # Plain string or vault encrypted

Note that template rendering and variable substitution are case-sensitive.

Project structure

The default project structure for any DataOps repository should be:

/dataops
/dataops/snowflake This is where all configurations for DataOps Snowflake Object Lifecycle Engine go
/dataops/modelling This is where all configurations for DataOps Modelling and Transformation Engine go
/pipelines This is where all configurations for DataOps Pipelines go

You can create any other root-level directories for storing other code/configuration related to other orchestrators e.g. for Talend Jobs.

For more information, see DataOps Project Structure.

Git workflow and Git command line

If you are unfamiliar with git, we recommend reading Git in 30 Seconds.

Naming conventions

Branches

The following branch names have special meanings in a DataOps project and should not be used for any other purpose

  • main
  • qa
  • dev

Branches can have any other names as long as it doesn't have any white spaces or special characters, but the following best practices are strongly recommended:

  • A branch name must immediately tell other people what is in this branch
  • Where possible, a branch name should have a reference to the ticketing system/project management system
  • Don't use main, qa, or dev as part of other branch names. A really good branch name would be something like: DATATEAM-435-add-length-of-service-to-HR-Employee-Consumption-Model
  • Remember that the branch name will be used to create a dynamic Feature Database (see below)
  • Many tutorials suggest having branch names such as feat/new-table-creation. The / character will cause problems with DataOps projects, and so should be avoided.

For more information, see Branching Strategies.

Database structure

Database names

The data product platform automatically creates databases as needed using the following naming:

[DATABASE_NAME_PREFIX]_[DATABASE_IDENTIFIER]

DATABASE_NAME_PREFIX is set in the -ci.yml file. By default, this is set to DATAOPS. DATABASE_IDENTIFIER is calculated using logic in the -ci.yml. The default behavior is:

  • If branch=main then DATABASE_IDENTIFIER=PROD and therefore full DATABASE would be something like DATAOPS_PROD
  • If branch=qa then DATABASE_IDENTIFIER=QA and therefore full DATABASE would be something like DATAOPS_QA
  • If the branch name is anything else then DATABASE_IDENTIFIER=FEATURE_[BRANCH_NAME] but with everything other than alphanumeric characters removed e.g. if the branch name is DATATEAM-435-add-length-of-service-to-HR-Employee-Consumption-Model then the full DATABASE would be DATAOPS_FEATURE_DATATEAM435ADDLENGTHOFSERVICETOHEEMPLOYEECONSUMPTIONMODEL. This is at the very upper limit of a reasonable DATABASE name length, although Snowflake technically supports up to 255 characters.

For more information, see Database Objects Namespacing.

Schema names

Schema names should follow the following naming convention [source|business]_[stack_name]_[mda_layer]

For example, in the system ingesting from two source systems, hr, and sales (source stacks) and service two sets of business needs (business stacks), hr and sales forecasting, the following schemas would exist:

source_hr_curation
source_hr_ingestion
source_sales_curation
source_sales_ingestion
business_hr_calculation
business_hr_consumption
business_sales_forecasting_calculation
business_sales_forecasting_consumption

Modeling and transformation

Directory structure

Modelling and Transformation projects often end up with a large number of SQL and YAML files in various subdirectories.

Underneath /dataops/modelling, there are several standard directories:

  • /dataops/modelling/models - this is where most of your work will be done
  • /dataops/modelling/macros - this is where custom test definitions, macros, etc., are stored. See the detailed documentation for Modelling and Transformation
  • /dataops/modelling/snapshots - this is where you will define your slowly changing dimension tables (sometimes referred to as snapshots)

Modeling naming

Model name should follow the following naming convention [schema_name]_[model_name].[sql|yml]. Following the previous example, this may create:

/dataops/modelling/sources/hr/ingestion.yml
/dataops/modelling/sources/hr/source_hr_curation_employee.yml
/dataops/modelling/sources/hr/source_hr_curation_employee.sql
/dataops/modelling/sources/sales/ingestion.yml
/dataops/modelling/sources/sales/source_sales_curation_orders.yml
/dataops/modelling/sources/sales/source_sales_curation_orders.sql
/dataops/modelling/models/business/sales_forecasting/business_sales_forecasting_calculation_salestotals.sql
/dataops/modelling/models/business/sales_forecasting/business_sales_forecasting_calculation_salestotals.yml
/dataops/modelling/models/business/sales_forecasting/business_sales_forecasting_calculation_salesmissed.sql
/dataops/modelling/models/business/sales_forecasting/business_sales_forecasting_calculation_salesmissed.yml
/dataops/modelling/models/business/sales_forecasting/business_sales_forecasting_consumption_salescommission.sql
/dataops/modelling/models/business/sales_forecasting/business_sales_forecasting_consumption_salescommission.yml
/dataops/modelling/models/business/sales_forecasting/business_sales_forecasting_consumption_salesperformance.sql
/dataops/modelling/models/business/sales_forecasting/business_sales_forecasting_consumption_salesperformance.yml
note

There is some duplication between the directory path and the filename because the filename must be unique across the whole modeling and transformation project.

Useful DataOps.live tricks

Make a commit without running a pipeline

Include [skip ci] in your commit message from any Git client, not just the WebIDE. For example:

screenshot of skip ci commit message.