Skip to main content

Preinstalled CLI Tools in DataOps.live Develop

Feature release status badge: PubPrev
PubPrev

Here is a list of the command-line utilities that ship by default with the DataOps development environment.

danger

The tools installed in the DataOps development environment are subject to change without warning.

asdf

This command line tool is a version manager for multiple languages that allows you to easily switch between different versions of a language. It comes pre-installed in the development environment and can be used to manage versions of languages like Python, Ruby, Node.js, and more.

Configuration/Authentication

None required.

Additional information

Read more about The Multiple Runtime Version Manager.

aws CLI

This unified tool allows you to manage AWS services from the command line. It comes pre-installed in the development environment and can be used to manage a wide range of AWS services, including EC2 instances, S3 buckets, and more.

Configuration/Authentication

AWS credentials are required.

Additional information

Read more about AWS Command Line Interface.

before_script.sh

Resides at: /dataops-cde/scripts/

This is a cut-down version of the full before_script used in a DataOps pipeline, mainly to populate key environment-specific variables.

It is designed to be used by other parts of the development environment to ensure that the correct target account, database, etc., are being targeted. However, it can be useful to run it directly to ensure the correct variables are being populated.

DATAOPS_DATABASE is the most critical of these.

Configuration/Authentication

None required.

Usage example

/dataops-cde/scripts/before_script.sh

Produces:

DATAOPS_PREFIX=DATAOPS_TDO_22
DATAOPS_ENV_NAME=FB_DP_VERSIONING
DATAOPS_ENV_NAME_PROD=PROD
DATAOPS_ENV_NAME_QA=QA
DATAOPS_ENV_NAME_DEV=DEV
DATAOPS_BRANCH_NAME_PROD=main
DATAOPS_BRANCH_NAME_QA=qa
DATAOPS_BRANCH_NAME_DEV=dev
DATAOPS_DATABASE=DATAOPS_TDO_22_FB_DP_VERSIONING
DATABASE=DATAOPS_TDO_22_FB_DP_VERSIONING
DATAOPS_DATABASE_MASTER=DATAOPS_TDO_22_PROD
DATAOPS_NONDB_ENV_NAME=FB_DP_VERSIONING
DATAOPS_BEFORE_SCRIPT_HAS_RUN=YES

Additional information

Read more about the DataOps custom before script.

If you have created a custom before_script.sh in your DataOps project and want to use that within your DataOps development environment, don't hesitate to get in touch with our Support team.

dataops CLI

Configuration/Authentication

None required.

Usage example

dataops sole gen "create schema SALES_RECORD.SALES comment='This is a test DataOps.live schema.';"

Produces:

dataops INFO Reading config from create schema SALES_RECORD.SALES comment='This is a test DataOps.live schema.';
dataops INFO Extracting data from a raw DDL query.
dataops INFO Parsing schema object: SALES.
databases:
SALES_RECORD:
schemas:
SALES:
comment: This is a test DataOps.live schema.
is_managed: false
is_transient: false
manage_mode: all
dataops INFO Written output to stdout

Additional information

Read more about the DataOps CLI.

dataops-render

Resides at: /dataops-cde/scripts/

This is the same render engine that runs in jobs within a pipeline to render the .template.xxx files.

You can learn more about it in the DataOps template rendering section.

Configuration/Authentication

None required.

Usage example

/home/gitpod/.pyenv/versions/3.8.13/bin/python /dataops-cde/scripts/dataops-render -o --file $STREAMLIT_FOLDER/orders/app.template.py render-template

Produces:

/workspace/truedataops-22/dataops/streamlit/orders/app.py [exists, overwritten]

Additional information

Read more about the DataOps template rendering.

dbt CLI

Configuration/Authentication

~/.dbt/profiles.yml is built/rebuilt every time dbt is run using the following variables:

  • DATAOPS_MATE_SECRET_ACCOUNT
  • DATAOPS_MATE_SECRET_PASSWORD
  • DATAOPS_MATE_SECRET_USER
  • DATAOPS_MATE_ROLE
  • DATAOPS_MATE_WAREHOUSE
note

The old variable names e.g. starting with DBT_ are still compatible and functional. However, we recommend transitioning to the above new names e.g. starting with DATAOPS_MATE_ in your setup to stay current with the latest updates.

Usage example

cd dataops/modelling && dbt parse

Produces:

15:11:22  Running with dbt=1.2.1
15:11:22 Start parsing.
15:11:22 Dependencies loaded
15:11:22 ManifestLoader created
15:11:22 Manifest loaded
15:11:22 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 3 unused configuration paths:
- models.MyTemplate.dataops_meta
- models.MyTemplate.dataops_meta.pipelines
- snapshots

15:11:22 Manifest checked
15:11:23 Flat graph built
15:11:23 Manifest loaded
15:11:23 Performance info: target/perf_info.json
15:11:23 Done.

Additional information

Read more about the dbt CLI.

glab CLI

Configuration/Authentication

If the environment variable DATAOPS_ACCESS_TOKEN is set, then glab is automatically authenticated.

Usage example

glab ci list

Produces:

Showing 30 pipelines on dataops-demo-project/truedataops-22 (Page 1)

(success) • #851831 kaggledev (about 1 day ago)
(success) • #851817 kaggledev (about 1 day ago)
(success) • #851748 kaggledev (about 1 day ago)
(success) • #851730 kaggledev (about 1 day ago)
(success) • #851483 kaggledev (about 1 day ago)
(canceled) • #851482 kaggledev (about 1 day ago)
(success) • #851422 kaggledev (about 1 day ago)
(success) • #851415 kaggledev (about 1 day ago)
(canceled) • #851413 kaggledev (about 1 day ago)
(success) • #851284 kaggledev (about 1 day ago)
(canceled) • #851282 kaggledev (about 1 day ago)
(canceled) • #851281 kaggledev (about 1 day ago)
(canceled) • #851279 kaggledev (about 1 day ago)
(canceled) • #851276 kaggledev (about 1 day ago)
(skipped) • #851275 kaggledev (about 1 day ago)
(manual) • #851264 kaggledev (about 1 day ago)
(manual) • #851075 kaggledev (about 1 day ago)
(failed) • #851074 kaggle_dev (about 1 day ago)
(manual) • #851038 dp_versioning (about 1 day ago)
(manual) • #850819 dp_versioning_sean (about 1 day ago)
(manual) • #850766 dpversioning (about 1 day ago)
(manual) • #849322 dp_versioning (about 1 day ago)
(manual) • #849268 dp_versioning (about 1 day ago)
(manual) • #849172 dp_versioning (about 1 day ago)
(manual) • #848585 dp_versioning (about 1 day ago)
(manual) • #848390 dp_versioning (about 2 days ago)
(manual) • #848388 dp_versioning (about 2 days ago)
(manual) • #848380 dp_versioning (about 2 days ago)
(manual) • #848313 dp_versioning (about 2 days ago)
(manual) • #847459 dp_versioning (about 2 days ago)

Additional information

Read more about the GitLab CLI.

jupyter-notebook

This Visual Studio Code (VS Code) extension comes pre-installed in the DataOps development environment. It allows you to create, edit, and run shareable notebooks directly from within VS Code.

With the Jupyter notebook extension, you can easily manage your notebooks, run code cells, and view the output directly in VS Code. You can also use the extension to launch Jupyter servers and manage your kernel environments.

Configuration/Authentication

It is required if your Jupyter notebook needs it, depending on what it is doing, and you should do it within the application.

Additional information

Read more about the Jupyter Notebooks in VS Code.

pre-commit

This tool helps identify issues before committing changes to your project. Pre-commit hooks are scripts or commands that run automatically before a commit.

Configuration/Authentication

Configure pre-commit hooks by creating a pre-commit config file at your project's root. The pre-commit config file describes what repositories and hooks are installed.

Here is an example configuration:

.pre-commit-config.yaml
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-yaml
- repo: https://github.com/Yelp/detect-secrets
rev: v1.4.0
hooks:
- id: detect-secrets

This example config file specifies two hooks:

  • check-yaml: checks that all YAML files are valid
  • detect-secrets: checks that no secrets are committed to the repository

For more information on these particular hooks, see their respective projects on GitHub.

Usage

To enable pre-commit for your project run:

pre-commit install

Now the tool will run automatically on every commit against changed files.

note

Remember, to enable the tool, you must run pre-commit install every time you clone a project.

To run the hooks manually across all files, not just files that have changed, run:

pre-commit run --all-files

You can update your hooks to the latest version automatically by running:

pre-commit autoupdate

This will update your .pre-commit-config.yaml with the latest version for each repo.

Additional information

Read more about using pre-commit hooks in your DataOps project.

Read more about the full official pre-commit documentation.

snowpark

Snowpark allows you to write code in your preferred language and run that code directly on Snowflake. The DevReady is already fully pre-configured with all the main libraries and tools required for Snowpark development and testing.

Configuration/Authentication

Snowflake credentials are required.

Usage examples

See Snowpark use cases for usage examples.

Additional information

Read more about Snowpark.

snowsql CLI

Configuration/Authentication

None required. The DataOps development environment sets the SNOWSQL environment variables based on the current MATE credentials.

Usage example

snowsql -q "select * from STARSCHEMA.DIM_CUSTOMER limit 5;"

Produces:

* SnowSQL * v1.2.26
Type SQL statements or !help
+-------+---------------------+-----------+--------------+------------------+-------------------+------------+---------------+--------------+--------+---------------+----------------------+-----------------+----------------+-----------------+---------------+-------------------+---------------------+
| TITLE | FULLNAME | ADDRESSID | PHONENUMBER | TOTALPURCHASEYTD | DATEFIRSTPURCHASE | BIRTHDATE | MARITALSTATUS | YEARLYINCOME | GENDER | TOTALCHILDREN | NUMBERCHILDRENATHOME | EDUCATION | OCCUPATION | NUMBERCARSOWNED | HOMEOWNERFLAG | COUNTRYREGIONCODE | NAME |
|-------+---------------------+-----------+--------------+------------------+-------------------+------------+---------------+--------------+--------+---------------+----------------------+-----------------+----------------+-----------------+---------------+-------------------+---------------------|
| Mr. | David R. Robinett | 29177 | 238-555-0100 | 16.01 | 2003-09-01 | 1961-02-23 | M | 25001-50000 | M | 4 | 0 | Graduate Degree | Clerical | 0 | 1 | DE | Nordrhein-Westfalen |
| Ms. | Rebecca A. Robinson | 16400 | 648-555-0100 | 4 | 2004-06-05 | 1965-06-11 | M | 50001-75000 | F | 3 | 3 | Bachelors | Professional | 1 | 1 | AU | Victoria |
| Ms. | Dorothy B. Robinson | 19867 | 423-555-0100 | 4730.04 | 2002-04-07 | 1954-09-23 | S | 75001-100000 | M | 2 | 0 | Partial College | Skilled Manual | 2 | 0 | AU | Victoria |
| Ms. | Carol Ann F. Rockne | 17009 | 439-555-0100 | 2435.4018 | 2001-10-27 | 1943-07-15 | M | 25001-50000 | M | 1 | 0 | Bachelors | Clerical | 0 | 1 | GB | England |
| Mr. | Scott M. Rodgers | 13003 | 989-555-0100 | 1647 | 2002-04-18 | 1968-05-15 | M | 50001-75000 | M | 2 | 2 | Bachelors | Professional | 1 | 1 | AU | Queensland |
+-------+---------------------+-----------+--------------+------------------+-------------------+------------+---------------+--------------+--------+---------------+----------------------+-----------------+----------------+-----------------+---------------+-------------------+---------------------+
5 Row(s) produced. Time Elapsed: 0.124s
Goodbye!

Additional information

Read more about snowsql.

sole-validate

This tool helps you validate your SOLE configuration locally before running a SOLE pipeline in the data product platform. It allows you to catch potential SOLE configuration issues before running the pipeline, reducing your development time.

Configuration/Authentication

None required.

Usage examples

See SOLE compilation and validation for usage examples.

Additional information

Read more about SOLE compilation and validation.

sqlfluff

A SQL linter (and fixer) as documented here.

Configuration/Authentication

None required.

Usage example

To run linting and report on the results:

cd dataops/modelling/models
sqlfluff lint --dialect snowflake

To run linting, excluding certain rules and report on the results:

cd dataops/modelling/models
sqlfluff lint --dialect snowflake --exclude-rules L036,L014

SQLFluff can also be used to fix certain issues:

cd dataops/modelling/models
sqlfluff fix --dialect snowflake --exclude-rules L036,L014
note

When you run the fix command, it may change many models from the current directory downwards. It could change hundreds or even thousands of models. Remember, you only need to stage and commit the modified models you want. Also, it would be best if you reran any changed models before you commit, validating they are still working as expected.

sqlfmt

A SQL formatter as documented here.

Configuration/Authentication

None required.

Usage example

cd dataops/modelling/models
sqlfmt .
note

Running the formatter changes all the models from the current directory downwards. It could change hundreds or even thousands of models. Remember, you only need to stage and commit the modified models you want. Also, it would be best if you reran any changed models before you commit, validating they are still working as expected.

streamlit

Configuration/Authentication

Configuration or authentication is done within your Streamlit application.

Usage example

cd dataops/streamlit/orders/ && streamlit run app.py

Produces:

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.


You can now view your Streamlit app in your browser.

Network URL: http://10.0.5.2:8501
External URL: http://18.202.147.41:8501

See Steamlit use cases for usage examples.

Additional information

Read more about Streamlit.