Skip to main content

Cheat Sheet

Overview

The DataOps platform is extremely powerful but requires a bit of knowledge. Like most powerful systems, the things you use every day will become second nature. This cheat sheet is for everything else.

Conventions

By convention, all VARIABLES, ENVIRONMENT_VARIABLES or PLACE_HOLDERS in templates or pipeline definition files are in upper case. This is simply so they are more clearly identifiable compared to most other text/configuration which is usually lower case e.g.

Run stack sales sources:
extends: .modelling_and_transformation_base
variables:
TRANSFORM_ACTION: RUN
TRANSFORM_MODEL_SELECTOR: "tag:source_stack_sales"

or

dbname: "{{ DATABASE }}" # Snowflake database name
user: "{{ SNOWFLAKE_USERNAME }}" # Snowflake user
password: "{{ SNOWFLAKE_PASSWORD }}" # Plain string or vault encrypted

Note that template rendering and variable substitution is case sensitive.

Project Structure

The default project structure for any DataOps repository should:

/dataops
/dataops/snowflake This is where all configurations for DataOps Snowflake Object Lifecycle Engine go
/dataops/modelling This is where all configurations for DataOps Modelling and Transformation Engine go
/pipelines This is where all configurations for DataOps Pipelines go

You can create any other root-level directories for storing other code/configuration related to other systems e.g. for Talend Jobs.

For full details refer to DataOps Project Structure.

Git workflow and Git Command Line

If you are not familiar with git, we recommend reading Git in 30 Seconds.

Naming conventions

Branches

The following branch names have special meanings in a DataOps project and should not be used for any other purpose

  • master
  • qa
  • dev

Branches can have any other names, as long as it doesn't have any white spaces or special characters, but the following best practices are strongly recommended:

  • A branch name must immediately tell other people what is in this branch
  • Where possible a branch name should have a reference to the ticketing system/project management system
  • Don't use master, qa or dev as part of other branch names A really good branch name would be something like: DATATEAM-435-add-length-of-service-to-HR-Employee-Consumption-Model
  • Remember that the branch name will be used to create a dynamic Feature Database (see below)
  • Many tutorials suggest having branch names such as feat/new-table-creation. The / character will cause problems with DataOps projects, and so should be avoided.

For full details refer to Branching Strategies.

Database structure

Database Names

The DataOps platform automatically creates Databases as needed using the following naming:

[DATABASE_NAME_PREFIX]_[DATABASE_IDENTIFIER]

DATABASE_NAME_PREFIX set in the -ci.yml file. By default, this is set to DATAOPS DATABASE_IDENTIFIER is calculated using logic in the -ci.yml. The default behavior is:

  • If branch=master then DATABASE_IDENTIFIER=PROD and therefore full DATABASE would be something like DATAOPS_PROD
  • If branch=qa then DATABASE_IDENTIFIER=QA and therefore full DATABASE would be something like DATAOPS_QA
  • If the branch name is anything else then DATABASE_IDENTIFIER=FEATURE_[BRANCH_NAME] but with everything other than alphanumeric characters removed e.g. if the branch name is DATATEAM-435-add-length-of-service-to-HR-Employee-Consumption-Model then the full DATABASE would be DATAOPS_FEATURE_DATATEAM435ADDLENGTHOFSERVICETOHEEMPLOYEECONSUMPTIONMODEL. This is at the very upper limit of a reasonable DATABASE name length, although Snowflake technically supports up to 255 characters.

For full details refer to Database Objects Namespacing.

Schema Names

Schema names should follow the following naming convention [source|business]_[stack_name]_[mda_layer]

For example in the system ingesting from two source systems, hr and sales (source stacks) and service two sets of business needs (business stacks), hr and salesforecasting, the following schemas would exist:

source_hr_curation
source_hr_ingestion
source_sales_curation
source_sales_ingestion
business_hr_calculation
business_hr_consumption
business_salesforecasting_calculation
business_salesforecasting_consumption

Modelling and Transformation

Directory Structure

Modelling and Transformation projects often end up with a large number of SQL and YAML files in various subdirectories.

Underneath /dataops/modelling there are several standard directories:

  • /dataops/modelling/models - this is where most of your work will be done
  • /dataops/modelling/sources - this is where you define all your ingesting sources
  • /dataops/modelling/macros - this is where custom test definitions, macros etc are stored. See the detailed documentation for Modelling and Transformation
  • /dataops/modelling/snapshots - this is where you will define your slowly changing dimension tables (sometimes referred to as snapshots)

Modelling Naming

Model name should follow the following naming convention [schema_name]_[model_name].[sql|yml]. Following from the previous example, this may create:

/dataops/modelling/sources/hr/ingestion.yml
/dataops/modelling/sources/hr/source_hr_curation_employee.yml
/dataops/modelling/sources/hr/source_hr_curation_employee.sql
/dataops/modelling/sources/sales/ingestion.yml
/dataops/modelling/sources/sales/source_sales_curation_orders.yml
/dataops/modelling/sources/sales/source_sales_curation_orders.sql
/dataops/modelling/models/business/salesforecasting/business_salesforecasting_calculation_salestotals.sql
/dataops/modelling/models/business/salesforecasting/business_salesforecasting_calculation_salestotals.yml
/dataops/modelling/models/business/salesforecasting/business_salesforecasting_calculation_salesmissed.sql
/dataops/modelling/models/business/salesforecasting/business_salesforecasting_calculation_salesmissed.yml
/dataops/modelling/models/business/salesforecasting/business_salesforecasting_consumption_salescommission.sql
/dataops/modelling/models/business/salesforecasting/business_salesforecasting_consumption_salescommission.yml
/dataops/modelling/models/business/salesforecasting/business_salesforecasting_consumption_salesperformance.sql
/dataops/modelling/models/business/salesforecasting/business_salesforecasting_consumption_salesperformance.yml

Note - there is some duplication between the directory path and the filename - this is because the filename must be unique across the whole Modelling and Transformation project.

Useful DataOps Tricks

Make a commit without running a pipeline

Include [skip ci] in your commit message e.g.

skipci.

Note this can be done in any commit message from any git client, not just the WebIDE.