Requirement
In this post, we are going to create a mount point in Azure Databricks to access the Azure Datalake data. This is a one-time activity. Once we create the mount point of blob storage, we can directly use this mount point to access the files.
Earlier, in one of our posts, we had created the mount point of the ADLS Gen2 without SPN. Here, we will use Service Principal and OAuth for creating the mount.
Prerequisite
- Azure Account
- Azure Data Lake Storage (ADLS Gen2 Storage)
- Key Vault
- Azure Databricks
- Service Principal
Solution
Let’s first create a Service Principal for authorization.
Step 1: Create Service Principal (SPN)
In the last post, we have learned to create a Service Principal in Azure. You can read this post for more details:
Create Service Principal in Azure
Step 2: Create Secret Scope in Azure Databricks
Please refer to this post Create Secret Scope in Azure Databricks.
Step 3: Get App Client Id & Secrets
In this step, we will get the Client Id and also will create the secret for this created app.
Get the client id from App:
Create App Secret
Go to the Certificate & secrets from the left pane under Manage and click on + New Secret secret for creating a secret to this app.
Note: Here, we have given Expires Never, but it’s good to renew the secret time-to-time.
Once you click on Add, a new client secret will be created. You can copy and use it wherever needed.
Step 4: Set App Client Id & Secrets in Key Vault
In this step, we are going to use Key Vault to store the Client Id and Created client secret in the previous step for better security.
Likewise, you can create the secret for the client secret.
Step 5: Add Policy for App in Key Vault
Once, you are done with the secret creation for the client id and client secret of the app in the Key Vault, let’s add the access polity to provide the authorization to the app to access the Key Vault.
Click on the Access Policy from the left pane of the Key Vault resource, and click on + Add Access Policy.
Here, select an appropriate Configure from the drop-down and select the Registered App.
Once you add the access policy, you will see it under the access polity page.
Step 6: Create Mount using Service Principal & OAuth
Once you are done with all the above steps, use the below template to create the Mount Point.
val configs = Map( "fs.azure.account.auth.type" -> "OAuth", "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", "fs.azure.account.oauth2.client.id" -> dbutils.secrets.get(scope = "databricks-secret-scope", key = "access-app-key"), "fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope = "databricks-secret-scope", key = "access-app-secret"), "fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/ee9b56f1-d84f-43e7-9f6d-98645e72d1f1/oauth2/token/") dbutils.fs.mount( source = "abfss://blob-container@bdpgen2datalake.dfs.core.windows.net", mountPoint = "/mnt/bdpdatalake", extraConfigs = configs)
Here,
fs.azure.account.oauth2.client.endpoint
https://login.microsoftonline.com/<app_tenant_id>/oauth2/token/
Source
abfss://<file_system>@<storage_name>.dfs.core.windows.net
Step 7: Mount Validation
For listing all the mounts, use below command
%scala dbutils.fs.ls("/mnt") %fs ls /mnt
Wrapping Up
In this post, we have learned about Service Principal, and how to use this for creating the mount point of an Azure Datalake with OAuth. This is a good way to create a mount with authorization.
…