Preinstalled CLI Tools in DataOps.live Develop
Here is a list of the command-line utilities that ship by default with the DataOps development environment.
The tools installed in the DataOps development environment are subject to change without warning.
asdf
This command line tool is a version manager for multiple languages that allows you to easily switch between different versions of a language. It comes pre-installed in the development environment and can be used to manage versions of languages like Python, Ruby, Node.js, and more.
Configuration/Authentication
None required.
Additional information
Read more about The Multiple Runtime Version Manager.
aws CLI
This unified tool allows you to manage AWS services from the command line. It comes pre-installed in the development environment and can be used to manage a wide range of AWS services, including EC2 instances, S3 buckets, and more.
Configuration/Authentication
AWS credentials are required.
Additional information
Read more about AWS Command Line Interface.
before_script.sh
Resides at: /dataops-cde/scripts/
This is a cut-down version of the full before_script used in a DataOps pipeline, mainly to populate key environment-specific variables.
It is designed to be used by other parts of the development environment to ensure that the correct target account, database, etc., are being targeted. However, it can be useful to run it directly to ensure the correct variables are being populated.
DATAOPS_DATABASE
is the most critical of these.
Configuration/Authentication
None required.
Usage example
/dataops-cde/scripts/before_script.sh
Produces:
DATAOPS_PREFIX=DATAOPS_TDO_22
DATAOPS_ENV_NAME=FB_DP_VERSIONING
DATAOPS_ENV_NAME_PROD=PROD
DATAOPS_ENV_NAME_QA=QA
DATAOPS_ENV_NAME_DEV=DEV
DATAOPS_BRANCH_NAME_PROD=main
DATAOPS_BRANCH_NAME_QA=qa
DATAOPS_BRANCH_NAME_DEV=dev
DATAOPS_DATABASE=DATAOPS_TDO_22_FB_DP_VERSIONING
DATABASE=DATAOPS_TDO_22_FB_DP_VERSIONING
DATAOPS_DATABASE_MASTER=DATAOPS_TDO_22_PROD
DATAOPS_NONDB_ENV_NAME=FB_DP_VERSIONING
DATAOPS_BEFORE_SCRIPT_HAS_RUN=YES
Additional information
Read more about the DataOps custom before script.
If you have created a custom before_script.sh
in your DataOps project and want to use that within your DataOps development environment, don't hesitate to get in touch with our Support team.
dataops CLI
Configuration/Authentication
None required.
Usage example
dataops sole gen "create schema SALES_RECORD.SALES comment='This is a test DataOps.live schema.';"
Produces:
dataops INFO Reading config from create schema SALES_RECORD.SALES comment='This is a test DataOps.live schema.';
dataops INFO Extracting data from a raw DDL query.
dataops INFO Parsing schema object: SALES.
databases:
SALES_RECORD:
schemas:
SALES:
comment: This is a test DataOps.live schema.
is_managed: false
is_transient: false
manage_mode: all
dataops INFO Written output to stdout
Additional information
Read more about the DataOps CLI.
dataops-render
Resides at: /dataops-cde/scripts/
This is the same render engine that runs in jobs within a pipeline to render the .template.xxx
files.
You can learn more about it in the DataOps template rendering section.
Configuration/Authentication
None required.
Usage example
/home/gitpod/.pyenv/versions/3.8.13/bin/python /dataops-cde/scripts/dataops-render -o --file $STREAMLIT_FOLDER/orders/app.template.py render-template
Produces:
/workspace/truedataops-22/dataops/streamlit/orders/app.py [exists, overwritten]
Additional information
Read more about the DataOps template rendering.
dbt CLI
Configuration/Authentication
~/.dbt/profiles.yml
is built/rebuilt every time dbt is run using the following variables:
- DATAOPS_MATE_SECRET_ACCOUNT
- DATAOPS_MATE_SECRET_PASSWORD
- DATAOPS_MATE_SECRET_USER
- DATAOPS_MATE_ROLE
- DATAOPS_MATE_WAREHOUSE
The old variable names e.g. starting with DBT_
are still compatible and functional. However, we recommend transitioning to the above new names e.g. starting with DATAOPS_MATE_
in your setup to stay current with the latest updates.
Usage example
cd dataops/modelling && dbt parse
Produces:
15:11:22 Running with dbt=1.2.1
15:11:22 Start parsing.
15:11:22 Dependencies loaded
15:11:22 ManifestLoader created
15:11:22 Manifest loaded
15:11:22 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 3 unused configuration paths:
- models.MyTemplate.dataops_meta
- models.MyTemplate.dataops_meta.pipelines
- snapshots
15:11:22 Manifest checked
15:11:23 Flat graph built
15:11:23 Manifest loaded
15:11:23 Performance info: target/perf_info.json
15:11:23 Done.
Additional information
Read more about the dbt CLI.
glab CLI
Configuration/Authentication
If the environment variable DATAOPS_ACCESS_TOKEN
is set, then glab
is automatically authenticated.
Usage example
glab ci list
Produces:
Showing 30 pipelines on dataops-demo-project/truedataops-22 (Page 1)
(success) • #851831 kaggledev (about 1 day ago)
(success) • #851817 kaggledev (about 1 day ago)
(success) • #851748 kaggledev (about 1 day ago)
(success) • #851730 kaggledev (about 1 day ago)
(success) • #851483 kaggledev (about 1 day ago)
(canceled) • #851482 kaggledev (about 1 day ago)
(success) • #851422 kaggledev (about 1 day ago)
(success) • #851415 kaggledev (about 1 day ago)
(canceled) • #851413 kaggledev (about 1 day ago)
(success) • #851284 kaggledev (about 1 day ago)
(canceled) • #851282 kaggledev (about 1 day ago)
(canceled) • #851281 kaggledev (about 1 day ago)
(canceled) • #851279 kaggledev (about 1 day ago)
(canceled) • #851276 kaggledev (about 1 day ago)
(skipped) • #851275 kaggledev (about 1 day ago)
(manual) • #851264 kaggledev (about 1 day ago)
(manual) • #851075 kaggledev (about 1 day ago)
(failed) • #851074 kaggle_dev (about 1 day ago)
(manual) • #851038 dp_versioning (about 1 day ago)
(manual) • #850819 dp_versioning_sean (about 1 day ago)
(manual) • #850766 dpversioning (about 1 day ago)
(manual) • #849322 dp_versioning (about 1 day ago)
(manual) • #849268 dp_versioning (about 1 day ago)
(manual) • #849172 dp_versioning (about 1 day ago)
(manual) • #848585 dp_versioning (about 1 day ago)
(manual) • #848390 dp_versioning (about 2 days ago)
(manual) • #848388 dp_versioning (about 2 days ago)
(manual) • #848380 dp_versioning (about 2 days ago)
(manual) • #848313 dp_versioning (about 2 days ago)
(manual) • #847459 dp_versioning (about 2 days ago)
Additional information
Read more about the GitLab CLI.
jupyter-notebook
This Visual Studio Code (VS Code) extension comes pre-installed in the DataOps development environment. It allows you to create, edit, and run shareable notebooks directly from within VS Code.
With the Jupyter notebook extension, you can easily manage your notebooks, run code cells, and view the output directly in VS Code. You can also use the extension to launch Jupyter servers and manage your kernel environments.
Configuration/Authentication
It is required if your Jupyter notebook needs it, depending on what it is doing, and you should do it within the application.
Additional information
Read more about the Jupyter Notebooks in VS Code.
pre-commit
This tool helps identify issues before committing changes to your project. Pre-commit hooks are scripts or commands that run automatically before a commit.
Configuration/Authentication
Configure pre-commit hooks by creating a pre-commit config file at your project's root. The pre-commit config file describes what repositories and hooks are installed.
Here is an example configuration:
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-yaml
- repo: https://github.com/Yelp/detect-secrets
rev: v1.4.0
hooks:
- id: detect-secrets
This example config file specifies two hooks:
check-yaml
: checks that all YAML files are validdetect-secrets
: checks that no secrets are committed to the repository
For more information on these particular hooks, see their respective projects on GitHub.
Usage
To enable pre-commit for your project run:
pre-commit install
Now the tool will run automatically on every commit against changed files.
Remember, to enable the tool, you must run pre-commit install
every time you clone a project.
To run the hooks manually across all files, not just files that have changed, run:
pre-commit run --all-files
You can update your hooks to the latest version automatically by running:
pre-commit autoupdate
This will update your .pre-commit-config.yaml
with the latest version for each repo.
Additional information
Read more about using pre-commit hooks in your DataOps project.
Read more about the full official pre-commit documentation.
snowpark
Snowpark allows you to write code in your preferred language and run that code directly on Snowflake. The DevReady is already fully pre-configured with all the main libraries and tools required for Snowpark development and testing.
Configuration/Authentication
Snowflake credentials are required.
Usage examples
See Snowpark use cases for usage examples.
Additional information
Read more about Snowpark.
snowsql CLI
Configuration/Authentication
None required. The DataOps development environment sets the SNOWSQL environment variables based on the current MATE credentials.
Usage example
snowsql -q "select * from STARSCHEMA.DIM_CUSTOMER limit 5;"
Produces:
* SnowSQL * v1.2.26
Type SQL statements or !help
+-------+---------------------+-----------+--------------+------------------+-------------------+------------+---------------+--------------+--------+---------------+----------------------+-----------------+----------------+-----------------+---------------+-------------------+---------------------+
| TITLE | FULLNAME | ADDRESSID | PHONENUMBER | TOTALPURCHASEYTD | DATEFIRSTPURCHASE | BIRTHDATE | MARITALSTATUS | YEARLYINCOME | GENDER | TOTALCHILDREN | NUMBERCHILDRENATHOME | EDUCATION | OCCUPATION | NUMBERCARSOWNED | HOMEOWNERFLAG | COUNTRYREGIONCODE | NAME |
|-------+---------------------+-----------+--------------+------------------+-------------------+------------+---------------+--------------+--------+---------------+----------------------+-----------------+----------------+-----------------+---------------+-------------------+---------------------|
| Mr. | David R. Robinett | 29177 | 238-555-0100 | 16.01 | 2003-09-01 | 1961-02-23 | M | 25001-50000 | M | 4 | 0 | Graduate Degree | Clerical | 0 | 1 | DE | Nordrhein-Westfalen |
| Ms. | Rebecca A. Robinson | 16400 | 648-555-0100 | 4 | 2004-06-05 | 1965-06-11 | M | 50001-75000 | F | 3 | 3 | Bachelors | Professional | 1 | 1 | AU | Victoria |
| Ms. | Dorothy B. Robinson | 19867 | 423-555-0100 | 4730.04 | 2002-04-07 | 1954-09-23 | S | 75001-100000 | M | 2 | 0 | Partial College | Skilled Manual | 2 | 0 | AU | Victoria |
| Ms. | Carol Ann F. Rockne | 17009 | 439-555-0100 | 2435.4018 | 2001-10-27 | 1943-07-15 | M | 25001-50000 | M | 1 | 0 | Bachelors | Clerical | 0 | 1 | GB | England |
| Mr. | Scott M. Rodgers | 13003 | 989-555-0100 | 1647 | 2002-04-18 | 1968-05-15 | M | 50001-75000 | M | 2 | 2 | Bachelors | Professional | 1 | 1 | AU | Queensland |
+-------+---------------------+-----------+--------------+------------------+-------------------+------------+---------------+--------------+--------+---------------+----------------------+-----------------+----------------+-----------------+---------------+-------------------+---------------------+
5 Row(s) produced. Time Elapsed: 0.124s
Goodbye!
Additional information
Read more about snowsql.
sole-validate
This tool helps you validate your SOLE configuration locally before running a SOLE pipeline in the data product platform. It allows you to catch potential SOLE configuration issues before running the pipeline, reducing your development time.
Configuration/Authentication
None required.
Usage examples
See SOLE compilation and validation for usage examples.
Additional information
Read more about SOLE compilation and validation.
sqlfluff
A SQL linter (and fixer) as documented here.
Configuration/Authentication
None required.
Usage example
To run linting and report on the results:
cd dataops/modelling/models
sqlfluff lint --dialect snowflake
To run linting, excluding certain rules and report on the results:
cd dataops/modelling/models
sqlfluff lint --dialect snowflake --exclude-rules L036,L014
SQLFluff can also be used to fix certain issues:
cd dataops/modelling/models
sqlfluff fix --dialect snowflake --exclude-rules L036,L014
When you run the fix
command, it may change many models from the current directory downwards. It could change hundreds or even thousands of models. Remember, you only need to stage and commit the modified models you want. Also, it would be best if you reran any changed models before you commit, validating they are still working as expected.
sqlfmt
A SQL formatter as documented here.
Configuration/Authentication
None required.
Usage example
cd dataops/modelling/models
sqlfmt .
Running the formatter changes all the models from the current directory downwards. It could change hundreds or even thousands of models. Remember, you only need to stage and commit the modified models you want. Also, it would be best if you reran any changed models before you commit, validating they are still working as expected.
streamlit
Configuration/Authentication
Configuration or authentication is done within your Streamlit application.
Usage example
cd dataops/streamlit/orders/ && streamlit run app.py
Produces:
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
You can now view your Streamlit app in your browser.
Network URL: http://10.0.5.2:8501
External URL: http://18.202.147.41:8501
See Steamlit use cases for usage examples.
Additional information
Read more about Streamlit.