How to Use Visual Studio Code with DataOps
As DataOps.live is a full-featured Git platform, you can use it with any IDE that supports Git integration. A particular favorite is Visual Studio (VS) Code, which is the subject of this content, but many of the tips here will apply to other applications.
While you can continue to use your own locally installed version of VS Code, we recommend using DDE Cloud (DataOps Development Environment with cloud deployment) instead.
Suggested setup
1. Install code
If you don't already have VS Code, visit Download Visual Studio Code to install it. Once installed, we recommend using the settings sync to back up and sync all your configurations.
2. Install extensions
These are some of the extensions that we like to use:
- Better Jinja - Syntax highlighting SQL models in dbt.
- Better TOML - Syntax highlighting TOML files.
- dbt Power User - Integration with dbt for running models.
- vscode-dbt - Snippets for autocompletion in dbt files.
3. Configure settings
Associate SQL files with the Better Jinja extension:
File->Preferences->Settings
,- Search for
files.associations
, - Add item
.sql
with valuejinja-sql
Cloning a DataOps project
If you are unfamiliar with cloning Git projects in VS Code, here's a quick guide:
- Obtain the project URL from DataOps by clicking the Clone button on the project home page, then copy the Clone with HTTPS or Clone with SSH link (including the
.git
extension) - In VS Code, open the source control pane on the activity bar (ctrl-shift-g)
- Click Clone Repository
- Paste the copied project URL and hit Enter
- Select a file location to clone into (we recommend a dedicated directory for local clones, e.g.
~/git/
)
Development workflow tips
If you are unfamiliar with working with Git projects remotely, here are a few pointers:
- Switch branch - use the branch selector on the left-hand end of the lower status bar, then select your branch from the drop-down that appears.
- Create a branch - use the branch selector (as above) then use the option to create a branch. Please note, that the new branch will be created from the branch that was previously active.
- Commit changes - switch to the source control pane on the activity bar (ctrl-shift-g) and enter a commit message. Then press ctrl-Enter to add all untracked files and commit.
- Trigger a pipeline run - Add a CI tag to the commit message in the form
[FILENAME.yml ci]
, e.g.[full-ci.yml ci]
. - Push committed changes - Click the
Publish Changes
button next to the branch selector on the lower status bar.
Modeling and Transformation Engine tips
DataOps Modelling and Transformation is our enhanced version of dbt. It's designed to be used from within a DataOps Orchestrator, but it's possible to dbt locally on your machine for rapid development. To use the dbt Power User extension to compile and run models, you will need a working, local installation of dbt. We recommend running dbt in a Linux environment rather than directly on Windows, so the best platform is WSL2.
Installing dbt
Please see this page for instructions on installing dbt. When you configure your profile, we suggest you name it the same as the profile
specified in your project's dbt_project.yml
file, typically snowflake_operations
(or dlxsnowflake
for older projects). Your profiles.yml may look like this:
snowflake_operations:
target: other
outputs:
other:
type: snowflake
account: <youraccount>
user: DATAOPS_TRANSFORMATION
role: DATAOPS_WRITER
password: <yourpassword>
database: "{{ env_var('DATABASE') }}"
schema: BASESCHEMA
warehouse: DATAOPS_TRANSFORMATION
threads: 8
client_session_keep_alive: False
config:
send_anonymous_usage_stats: False
Additionally, it's best to ensure you've already run a pipeline in your feature branch before attempting to run individual models.
Getting the dataops
package
This process will be changing soon to be easier to use.
To run a DataOps Modelling and Transformation project, access to the dataops
extension package is required. In pipeline orchestration, this is built into the DataOps Modelling and Transformation Orchestrator. However, if you are running locally, you need local access to this. The best way to get this, if you have docker available, is to extract this from the orchestrator. This can be done with the following:
docker run -v /home/ubuntu/dataops:/tmp/dataops -it dataopslive/dataops-transform-runner:5-stable bash -c "cp -R /app/dataops_admin/dbt_packages/dataops/* /tmp/dataops"
Change /home/ubuntu/dataops
for wherever you need the dataops
extension package to live. For me, it was /home/guy/truedataops-5/dataops/modelling/dbt_packages/dataops
.
If you don't have access locally, please contact support@dataops.live and we'll give you access to this package.
Running dbt
Running the DataOps-enhanced version of dbt requires some environment variables to be set. You can spoof all of these by running the following:
export CI_PIPELINE_URL=null && export CI_PIPELINE_ID=manual && export CI_JOB_NAME=manual && export CI_COMMIT_REF_NAME=manual && export CI_RUNNER_TAGS=manual && export DATAOPS_DATABASE=DATAOPS_FB_MYFEATURE
Set the DATABASE variable to whichever Snowflake database you want to operate against.
Once you have done this in your local terminal, navigate to your dataops/modelling
folder and start running your models following the standard dbt run documentation and, in particular, the dbt node selection documentation.
Putting this together will look something like this:
Tips and tricks
If you get errors about missing relations when running a model from the editor, you may first need to run the entire project.
Remote connectivity
VS Code can be used to connect to a remote Linux host over SSH, giving you a development experience very similar to running Linux locally.
For details, read configuring VS Code to connect over SSH.