🆒Deployment Steps

Deploy Azure Databricks in your Azure virtual network (VNet injection)

Sign in to the Azure portal - https://portal.azure.com/

Create a virtual network

Steps 1: From the Azure portal menu, select Create a resource. Then select Networking > Virtual network.

Steps 2: Under Create virtual network, apply the following settings:

Setting
Suggested value
Description

Subscription

<Your subscription>

Select the Azure subscription that you want to use.

Resource group

databricks-project

Select Create New and enter a new resource group name for your account.

Name

databricks-vnet

Select a name for your virtual network.

Region

<Select the region that is closest to your users>

Select a geographic location where you can host your virtual network. Use the location that's closest to your users.

Select Next: IP Addresses > and apply the following settings. Then select Review + create.

Setting
Suggested value
Description

IPv4 address space

10.26.1.0/24

The virtual network's address range in CIDR notation. The CIDR range must be between /16 and /24

Subnet name

default

Select a name for the default subnet in your virtual network.

Subnet Address range

10.26.1.0/24

The subnet's address range in CIDR notation. It must be contained by the address space of the virtual network. The address range of a subnet which is in use can't be edited.

Create an Azure Databricks workspace

From the Azure portal menu, select Create a resource. Then select Analytics > Databricks.

Under Azure Databricks Service, apply the following settings:

Setting
Suggested value
Description

Workspace name

databricks-workspace

Select a name for your Azure Databricks workspace.

Subscription

<Your subscription>

Select the Azure subscription that you want to use.

Resource group

databricks-project

Select the same resource group you used for the virtual network.

Location

<Select the region that is closest to your users>

Choose the same location as your virtual network.

Pricing Tier

Choose between Standard or Premium.

For more information on pricing tiers, see the Databricks pricing page.

Once you've finished entering settings on the Basics page, select Next: Networking > and apply the following settings:

Setting
Suggested value
Description

Deploy Azure Databricks workspace in your Virtual Network (VNet)

Yes

This setting allows you to deploy an Azure Databricks workspace in your virtual network.

Virtual Network

databricks-quickstart

Select the virtual network you created in the previous section.

Public Subnet Name

Use the default public subnet name.

Public Subnet CIDR Range

10.26.1.0/25

Use a CIDR range up to and including /25.

Private Subnet Name

Use the default private subnet name.

Private Subnet CIDR Range

10.26.1.128/25

Use a CIDR range up to and including /25.

Once the deployment is complete, navigate to the Azure Databricks resource. Notice that virtual network peering is disabled. Also notice the resource group and managed resource group in the overview page.

The managed resource group is not modifiable, and it is not used to create virtual machines. You can only create virtual machines in the resource group you manage.

When a workspace deployment fails, the workspace is still created in a failed state. Delete the failed workspace and create a new workspace that resolves the deployment errors. When you delete the failed workspace, the managed resource group and any successfully deployed resources are also deleted.

Create a cluster

  1. Return to your Azure Databricks service and select Launch Workspace on the Overview page.

Select Clusters > + Create Cluster. Then create a cluster name, like databricks-project-cluster, and accept the remaining default settings. Select Create Cluster.

Once the cluster is running, return to the managed resource group in the Azure portal. Notice the new virtual machines, disks, IP Address, and network interfaces. A network interface is created in each of the public and private subnets with IP addresses.

Return to your Azure Databricks workspace and select the cluster you created. Then navigate to the Executors tab on the Spark UI page. Notice that the addresses for the driver and the executors are in the private subnet range. In this example, the driver is 10.26.1.132 and executors are 10.26.1.133 and 10.26.1.134 and Your IP addresses could be different.

Clean up resources

After you have finished the article, you can terminate the cluster. To do so, from the Azure Databricks workspace, from the left pane, select Clusters. For the cluster you want to terminate, move the cursor over the ellipsis under Actions column, and select the Terminate icon. This stops the cluster.

If you do not manually terminate the cluster it will automatically stop, provided you selected the Terminate after __ minutes of inactivity checkbox while creating the cluster. In such a case, the cluster automatically stops, if it has been inactive for the specified time.

If you do not wish to reuse the cluster, you can delete the resource group you created in the Azure portal.

Creating a private link for DBFS root storage

The term DBFS comes from Databricks File System

Data Exfiltration Protection with Azure Databricks

Last updated