Skip to main content

Using Visual Studio Code with DataOps

As DataOps is a full-featured Git platform, it is possible to use it with any IDE that supports Git integration. A particular favorite is Visual Studio Code, which is the subject of this content, but many of the tips here will apply to other applications.

Suggested Setup

1. Install Code

If you don't already have it, VS Code can be installed from here. Once installed, we recommend using the settings sync to back up and sync all your configurations.

2. Install Extensions

These are some of the extensions that we like to use:

  • Better Jinja - Syntax highlighting SQL models in dbt.
  • Better TOML - Syntax highlighting TOML files.
  • dbt Power User - Integration with dbt for running models.
  • vscode-dbt - Snippets for autocompletion in dbt files.

3. Configure Settings

Associate SQL files with the Better Jinja extension:

  • File->Preferences->Settings,
  • Search for files.associations,
  • Add item .sql with value jinja-sql

Cloning a DataOps Project

If you are unfamiliar with cloning Git projects in VS Code, here's a quick guide:

  1. Obtain the project URL from DataOps by clicking the Clone button on the project home page, then copy the Clone with HTTPS or Clone with SSH link (including the .git extension)
  2. In VS Code, open the source control pane on the activity bar (ctrl-shift-g)
  3. Click Clone Repository
  4. Paste the copied project URL and hit Enter
  5. Select a file location to clone into (we recommend a dedicated directory for local clones, e.g. ~/git/)

Development Workflow Tips

If you are unfamiliar with working with Git projects remotely, here are a few pointers:

  • Switch branch - use the branch selector on the left-hand end of the lower status bar, then select your branch from the drop-down that appears.
  • Create a branch - use the branch selector (as above) then use the option to create a branch. Please note, that the new branch will be created from the branch that was previously active.
  • Commit changes - switch to the source control pane on the activity bar (ctrl-shift-g) and enter a commit message. Then press ctrl-Enter to add all untracked files and commit.
  • Trigger a pipeline run - Add a CI tag to the commit message in the form [FILENAME.yml ci], e.g. [full-ci.yml ci].
  • Push committed changes - Click the Publish Changes button next to the branch selector on the lower status bar.

Modelling and Transformation Tips

DataOps Modelling and Transformation is our enhanced version of dbt. It's designed to be used from within a DataOps Orchestrator, but it's possible to dbt locally on your machine for rapid development. To use the dbt Power User extension to compile and run models, you will need a working local installation of dbt. We recommend running dbt in a Linux environment, rather than directly on Windows, so the best platform is WSL2.

Installing dbt

Please see this page for instructions on installing dbt. When you configure your profile, we suggest you name it the same as the profile specified in your project's dbt_project.yml file, typically snowflake_operations (or dlxsnowflake for older projects). Your profiles.yml may look like this:

snowflake_operations:
target: other
outputs:
other:
type: snowflake
account: <youraccount>
user: DATAOPS_TRANSFORMATION
role: DATAOPS_WRITER
password: <yourpassword>
database: "{{ env_var('DATABASE') }}"
schema: BASESCHEMA
warehouse: DATAOPS_TRANSFORMATION
threads: 8
client_session_keep_alive: False

config:
send_anonymous_usage_stats: False

Additionally, it's best to make sure you've already run a pipeline in your feature branch before attempting to run individual models.

Getting the dataops extension module

info

This process will be changing in the near future to be easier to use.

To be able to run a DataOps Modelling and Transformation project, access to the dataops extension module is required. In pipeline operation, this is built into the DataOps Modelling and Transformation Orchestrator. However, if you are running locally, you need local access to this. The best way to get this, if you have docker available, is to extract this from the orchestrator. This can be done with:

docker run -v /home/ubuntu/dataops:/tmp/dataops -it dataopslive/dataops-transform-runner:5-stable bash  -c "cp -R /app/dataops_admin/dbt_packages/dataops/* /tmp/dataops"
IMPORTANT NOTE

Change /home/ubuntu/dataops for wherever you need the dataops extension pacakge to live. For me it was /home/guy/truedataops-5/dataops/modelling/dbt_packages/dataops.

If you don't have access locally, please contact support@dataops.live and we'll give you access to this package.

Running dbt

Running the DataOps enhanced version of dbt requires some environment variables to be set. You can spoof all of these by running:

export CI_PIPELINE_URL=null && export CI_PIPELINE_ID=manual && export CI_JOB_NAME=manual && export CI_COMMIT_REF_NAME=manual && export CI_RUNNER_TAGS=manual && export DATAOPS_DATABASE=DATAOPS_FB_MYFEATURE
IMPORTANT NOTE

Set the DATABASE variable to whichever Snowflake database you want to operate against.

Once you have done this, in your local terminal navigate to your dataops/modelling folder and start running your models following the standard dbt run documentation and in particular the dbt node selection documentation.

Putting this together will look something like this:

VS Code Run _shadow_

Tips and Tricks

If you get errors about missing relations when running a model from the editor, you may first need to run the entire project.

Remote Connectivity

VS Code can be used to connect to a remote Linux host over SSH, giving you a development experience very similar to running Linux locally.

Please see this page for details of how to configure VS Code to connect over SSH.