Requirement

Do you want to explore Spark? Azure provides a cloud service platform named databricks which is built on top of the Spark. In this post, we are going to create a databricks cluster in Azure.

Solution

Follow the below steps to create the databricks cluster in Azure.

Step 1: Login to Azure Portal

Go to portal.azure.com and login with your credential.

Step 2: Search for Databricks

Search databricks and click on Azure Databricks. It will land you to another page.

Currently, we don’t have any databricks cluster. You can create either clicking on +Add or Create Azure databricks service option.

Step 3: Create databricks service in Azure

Part I: Basics

Under Basics, choose subscription, resource group (if not available, create new).

Now, provide any workspace name, choose the location, and Pricing Tier. You will get below options in the pricing tier.

Pricing Tier

Standard (Apache Spark, Secure with Azure AD)
Premium (+Role based access control)
Trial(Premium – 14 days)

Part II: Networking

You can choose whether you want to deploy the workspace In VNet or not.

Part III: Tag

Under tag, you can provide any variable name and value against that.

Part IV: Review + Create

Now, all looks fine. Click on create.

It will take a few mins to get complete. Once, deployment is complete, click on Go to resource.

In the databricks home page, you will see the highlighted sections. It provides all the easy ways to get started with – explore with tutorial, import/export any existing databricks script, create a new notebook, Documentation, and some already completed tasks.

Step 4: Create databricks cluster

Let’s create a new cluster on the Azure databricks platform. Here, we will set up the configure.

Go to the cluster from the left bar.

Currently, we don’t have any existing cluster. Let’s create a new one.

Below is the configuration for the cluster set up. This is the least expensive configured cluster.

Configuration	Value/Version
Cluster Name	Any name
Cluster Mode	Standard
Pool	None
Databricks Runtime Version	5.5 LTS (Scala 2.11, Spark 2.4.3)
Python Version	3
Autopilot Option	Disabled Autoscaling Terminate cluster after 30 mins of inactivity
Worker Node	Standard_DS3_v2 x 2 [14.0 GB Memory, 4 Cores, 0.75 DBU]
Driver Node	Same as worker type

Advanced Option

Spark: We can set any spark configuration for performance tuning. Currently, I left blank. And for python, it will show the default location.

Tags: Here, by default, you will the below tags. The cluster ID will get generated after creating the cluster. In addition to the below default tag, you can add a new tag also.

Logging & Init: Didn’t make any changes in these 2 parts.

Once you click on create, it will take a few minutes to get created. It will show as running after a completion.

Step 5: Create Notebook

Go to the workspace from the left bar. You will see 2 options – Users and Shared.

Users – workspace for the user only.

Shared – Collaborative workspace for the team.

For creating the notebook, right-click and choose Notebook.

It provides an option in the default language. You can choose any from them. It will just create a notebook in the chosen language, but it also provides a way to write any other language in the same notebook.

Step 6: Create DataFrame in Notebook

Wrapping Up

In this post, we have created the databricks cluster with a specific configuration set up. We have also created a notebook and created a dataframe with Scala API. We can also use other programming languages for the notebook like R, Spark SQL, Python.

Create Databricks Cluster in Azure

Requirement

Solution

Step 1: Login to Azure Portal

Step 2: Search for Databricks

Step 3: Create databricks service in Azure

Part I: Basics

Part II: Networking

Part III: Tag

Part IV: Review + Create

Step 4: Create databricks cluster

Advanced Option

Step 5: Create Notebook

Step 6: Create DataFrame in Notebook

Wrapping Up

Leave a Reply Cancel reply

Free Products & Services in Azure

Create Free Account in Azure

Create Azure Data Lake Storage Gen2

Create Key Vault in Azure

Delete Credit Card from Azure Free Account

Create Databricks Cluster in Azure

Create Secret Scope in Azure Databricks

Access Azure Key Vault in Databricks

Create Mount Point in Azure Databricks

Create Service Principal in Azure

Create Mount Point in Azure Databricks Using Service Principal and OAuth

Read file from Azure Data Lake Gen2 using Spark

Read file from Azure Data Lake Gen2 using Python

Delta Lake in Databricks

Create Delta Table in Databricks

Create Delta Table from Path in Databricks

Certifications

Top Machine Learning Courses You Shouldn’t Miss

Top courses for data engineers

Top Big Data Courses on Udemy You should Take