Step-by-step guide to setting up databricks in Visual Studio Code

By Poorna Cooray

--

Recently, databricks announced the development of a databricks extension for visual studio code (VS Code). This new extension enables developers to write code locally, leveraging the powerful authoring capabilities of the IDE (Definition ?), while connecting to Databricks clusters to run code remotely.

With this new extension, we can:

· Synchronize local code from VS code with code in databricks workspaces.

· Run local python code (.py) files from VS code on databricks clusters.

· Run Python, R, Scala and SQL notebooks (.py, .ipynb, .r, .scala, .sql) from VS code as automated databricks jobs.

This article will cover a step-by-step guide on how to setup the databricks extension in VS code and run codes locally using databricks clusters.

Before using the databricks extension in VS Code, there are some requirements that needs to be met in your azure databricks workspace and local machine. You can refer this link to ensure that all the necessary requirements are met before setting up the extension.

Once all the workspace and local machine requirements are met, you can start setting up databricks in VS code by following the steps below.

Step 1: Install and setup VS Code

Firstly, ensure that you have VS code installed and setup in your local machine. You can use the link below to download the setup and install VS code.

Installation Link: Visual Studio Code — Code Editing. Redefined

Step 2: Install the databricks extension.

After setting up VS Code, you need to have the databricks extension installed in VS code. You can install by searching for Databricks extension in the “Extensions: Marketplace.”

Once installed, the databricks extension will be visible in the left-hand panel as below.

Step 3: Configure Databricks

Once the databricks extension is installed, before connecting to a databricks cluster, we must configure the databricks workspace. Click on the databricks extension from the left-hand panel and then select “Configure Databricks”.

This will open another prompt which will request the databricks host path.

The databricks host path will be the path that comes up when you open your databricks workspace.

After entering the databricks host, it will request you to authenticate databricks using Azure CLI or Databricks profiles.

Step 4: Authenticate databricks.

In this article, we will focus on creating a databricks profile to authenticate.

Create a new databricks profile by adding the host path and the access token to the .databrickscfg that opens up.

Note: A new access token can be generated by selecting “User Settings” on databricks and clicking “Generate token”

Once the profile is created, click again on configure databricks and it will show the created profile. Select the created profile. Once connected to databricks, the following can be seen under databricks configuration.

Step 5: Attach cluster

It can be observed that there is no cluster attached. You can connect to any cluster you have access to by clicking on the gear icon next to the cluster and selecting the cluster of your choice.

Once connected to a cluster, the cluster configuration can be observed as follows.

Make sure that the cluster is started and running before proceeding further. If the cluster is running it should show as follows.

Step 6: Create Databricks Repo

In order to run this extension, we need to create a repo in databricks and sync that repo to the local machine. You can access Repos from your databricks workspace by selecting Repo.

Create a new repo by selecting the gear icon next to Sync Destination and click Create New Sync Destination.

Name your repo and press Enter.

Check on databricks whether the repo is created. The created repo should show as below.

Step 7: Run python files on databricks cluster

Once your local code is synced to the repo on dataricks, you can run python files on databricks cluster as below.

Happy coding!

References:

Databricks ❤️ IDEs — The Databricks Blog

Databricks extension for Visual Studio Code — Azure Databricks | Microsoft Learn

https://www.youtube.com/watch?v=tSb8eXxvRWs

--

--

OCTAVE - John Keells Group
OCTAVE — John Keells Group

OCTAVE, the John Keells Group Centre of Excellence for Data and Advanced Analytics, is the cornerstone of the Group’s data-driven decision making.