Create Mount Point in Azure Databricks

Requirement

We have data in Azure Data Lake (blob storage). We want to read and process these data using Spark in Databricks. Before reading and processing the data, it is required to access the Azure Data Lake.

In this post, we are going to create a mount point in Azure Databricks to access the Azure Data lake. This is a one-time activity. Once we create the mount point of blob storage, we can directly use this mount point to access the files.

Prerequisite

For this post, it is required to have:

Solution

Step 1: Create a container in Azure Data Lake Gen2 Storage

Here, creating a container named blob-container.

 

Create a folder named blob-storage

Note: An empty folder will not be created. First, upload a file in a container, copy it, create a folder, and paste the file.

Step 2: Get ADLS Gen2 Access Key

Go to the Access Keys from the left panel and copy the key.

Step 3: Create Secret for Access Key in Azure Key Vault

Create a secret named blob-container-key and stored the copied key value from the last step.

Step 4: Create Mount in Azure Databricks

Databricks provide a method to create a mount point. The below code is the sample code to create a mount point using Scala programming language:

 dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>",
  mountPoint = "/mnt/<mount-name>",
  extraConfigs = Map("<conf-key>" -> dbutils.secrets.get(
                                                          scope = "<scope-name>", 
                                                          key = "<key-name>")
                                                        )
                    )

Here, we have to place the value for the below variables:

Variable

Value

Description

<container-name>

blob-container

ALDS Gen2 container name

<storage-account-name>

bdpgen2datalake

ADLS Gen2 Storage Account Name

<directory-name>

blob-storage

Folder name under Container

<mount-name>

blob-storage (any name as per choice)

Mount Point  (it can be anything)

<conf-key>

fs.azure.account.key.<storage-account-name>.blob.core.windows.net

                                            OR

fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net

Access Configuration Key for accessing the storage account.
Here, need to change the highlighted variable in the URL.

<scope-name>

databricks-secret-scope

Databricks secret scope name

<key-name>

blob-container-key

Secret Key in Key Vault which hold the Storage Account Key

Actual Code for creating Mount in Databricks using

Scala:

 dbutils.fs.mount(
  source = "wasbs://blob-container@bdpgen2datalake.blob.core.windows.net/blob-storage",
  mountPoint = "/mnt/blob-storage",
  extraConfigs = Map("fs.azure.account.key.bdpgen2datalake.blob.core.windows.net" -> 
                     dbutils.secrets.get(scope = "databricks-secret-scope", 
                                         key = "blob-container-key")))

Python:

 dbutils.fs.mount(
  source = "wasbs://blob-container@bdpgen2datalake.blob.core.windows.net/blob-storage",
  mount_point = "/mnt/blob-storage",
  extra_configs = {"fs.azure.account.key.bdpgen2datalake.blob.core.windows.net": 
                    dbutils.secrets.get(scope = "databricks-secret-scope", 
                                        key = "blob-container-key")})

Step 5: List Created Mount Point

You can also unmount the created mount point using the below command:

 dbutils.fs.unmount("/mnt/blob-storage")

Wrapping Up

In this post, we have learned to create a mount point for Azure Blob Storage in Azure Databricks. You can use this mount point to access any files available in the same Azure Container and Folder.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply