Requirement
In this post, we will perform a task to read an Excel file, process the data, and store the data in a Delta Table in Databricks.
Sample Data
We will take the below sample data for this exercise:
empno | ename | designation | manager | hire_date | sal | deptno | location |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | BANGALORE |
9499 | ALLEN | SALESMAN | 7698 | 2/20/1981 | 1600 | 30 | HYDERABAD |
9521 | WARD | SALESMAN | 7698 | 2/22/1981 | 1250 | 30 | PUNE |
9566 | TURNER | MANAGER | 7839 | 4/2/1981 | 2975 | 20 | MUMBAI |
9654 | MARTIN | SALESMAN | 7698 | 9/28/1981 | 1250 | 30 | CHENNAI |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | KOLKATA |
Solution
Step 1: Install the required package:
Go to Compute from the left panel, once the page opens, click on Libraries from the menu list as shown in the screenshot:
Make sure to choose Maven Package from the drop-down as by default it would be Spark Packages.
Click on install New, an Install Library window will open. Select Maven and click on Search Packages and search for com.crealytics
Choose any version, in my case I have chosen spark-excel_2.11 based on my scala version.
Step 2: Read Excel file in Spark
After the dependent JAR installation, let’s read the excel file from the file store and load it into the dataframe.
Code
val xslFilePath = "/FileStore/tables/emp_data1.xls" val xslDf = spark.read.format("com.crealytics.spark.excel") .option("header", "true") .option("dataAddress", "sheet1") .load(xslFilePath) display(xslDf)
Here,
format -> com.crealytics.spark.excel (dependent on the installed package)
header -> true if excel file contains a header. The default value is false if not specified.
dataAddress-> specify the name of the sheet available in the excel file
If want to check the data type of the loaded data into the dataframe:
Once you loaded data into DF. You can create the delta table using the below code:
xslDf.write.format("delta").saveAsTable("excel_tableName")
Wrapping Up
The excel format is a very frequently used file format. With this exercise, you can read the excel file and process it.