Requirement
In this post, we will perform a task to read an Excel file, process the data, and store the data in a Delta Table in Databricks.
![]()
Sample Data
We will take the below sample data for this exercise:
empno | ename | designation | manager | hire_date | sal | deptno | location |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | BANGALORE |
9499 | ALLEN | SALESMAN | 7698 | 2/20/1981 | 1600 | 30 | HYDERABAD |
9521 | WARD | SALESMAN | 7698 | 2/22/1981 | 1250 | 30 | PUNE |
9566 | TURNER | MANAGER | 7839 | 4/2/1981 | 2975 | 20 | MUMBAI |
9654 | MARTIN | SALESMAN | 7698 | 9/28/1981 | 1250 | 30 | CHENNAI |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | KOLKATA |
Solution
Step 1: Install the required package:
Go to Compute from the left panel, once the page opens, click on Libraries from the menu list as shown in the screenshot:
![]()
![]()
Make sure to choose Maven Package from the drop-down as by default it would be Spark Packages.
![]()
Click on install New, an Install Library window will open. Select Maven and click on Search Packages and search for com.crealytics
![]()
Choose any version, in my case I have chosen spark-excel_2.11 based on my scala version.
![]()
Step 2: Read Excel file in Spark
After the dependent JAR installation, let’s read the excel file from the file store and load it into the dataframe.
Code
val xslFilePath = "/FileStore/tables/emp_data1.xls"
val xslDf = spark.read.format("com.crealytics.spark.excel")
.option("header", "true")
.option("dataAddress", "sheet1")
.load(xslFilePath)
display(xslDf)Here,
format -> com.crealytics.spark.excel (dependent on the installed package)
header -> true if excel file contains a header. The default value is false if not specified.
dataAddress-> specify the name of the sheet available in the excel file
![]()
If want to check the data type of the loaded data into the dataframe:
![]()
Once you loaded data into DF. You can create the delta table using the below code:
xslDf.write.format("delta").saveAsTable("excel_tableName")Wrapping Up
The excel format is a very frequently used file format. With this exercise, you can read the excel file and process it.