Create Delta table from Excel File in Databricks

Requirement

In this post, we will perform a task to read an Excel file, process the data, and store the data in a Delta Table in Databricks.

Sample Data

We will take the below sample data for this exercise:

empno

ename

designation

manager

hire_date

sal

deptno

location

9369

SMITH

CLERK

7902

12/17/1980

800

20

BANGALORE

9499

ALLEN

SALESMAN

7698

2/20/1981

1600

30

HYDERABAD

9521

WARD

SALESMAN

7698

2/22/1981

1250

30

PUNE

9566

TURNER

MANAGER

7839

4/2/1981

2975

20

MUMBAI

9654

MARTIN

SALESMAN

7698

9/28/1981

1250

30

CHENNAI

9369

SMITH

CLERK

7902

12/17/1980

800

20

KOLKATA

Solution

Step 1: Install the required package:

Go to Compute from the left panel, once the page opens, click on Libraries from the menu list as shown in the screenshot:

Make sure to choose Maven Package from the drop-down as by default it would be Spark Packages.

Click on install New, an Install Library window will open. Select Maven and click on Search Packages and search for com.crealytics

Choose any version, in my case I have chosen spark-excel_2.11 based on my scala version.

Step 2: Read Excel file in Spark

After the dependent JAR installation, let’s read the excel file from the file store and load it into the dataframe.

Code

val xslFilePath = "/FileStore/tables/emp_data1.xls"
val xslDf = spark.read.format("com.crealytics.spark.excel")
                      .option("header", "true")
                      .option("dataAddress", "sheet1")
                      .load(xslFilePath)
display(xslDf)

Here,

format ->  com.crealytics.spark.excel (dependent on the installed package)

header -> true if excel file contains a header. The default value is false if not specified.

dataAddress-> specify the name of the sheet available in the excel file

 If want to check the data type of the loaded data into the dataframe:

Once you loaded data into DF. You can create the delta table using the below code:

 xslDf.write.format("delta").saveAsTable("excel_tableName")

Wrapping Up

The excel format is a very frequently used file format. With this exercise, you can read the excel file and process it.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply