Requirement
In this post, we are going to read a JSON file using Spark and then load it into a Delta table in Databricks.
Solution
We can use the below sample data for the exercise. In our case, we have placed this file is located in FilteStore in the Databricks cluster:
[ { "empno": "9369", "ename": "SMITH", "designation": "CLERK", "manager": "7902", "hire_date": "12/17/1980", "sal": "800", "deptno": "20", "location": "BANGALORE" }, { "empno": "9499", "ename": "ALLEN", "designation": "SALESMAN", "manager": "7698", "hire_date": "2/20/1981", "sal": "1600", "deptno": "30", "location": "HYDERABAD" }, { "empno": "9521", "ename": "WARD", "designation": "SALESMAN", "manager": "7698", "hire_date": "2/22/1981", "sal": "1250", "deptno": "30", "location": "PUNE" }, { "empno": "9566", "ename": "TURNER", "designation": "MANAGER", "manager": "7839", "hire_date": "4/2/1981", "sal": "2975", "deptno": "20", "location": "MUMBAI" }, { "empno": "9654", "ename": "MARTIN", "designation": "SALESMAN", "manager": "7698", "hire_date": "9/28/1981", "sal": "1250", "deptno": "30", "location": "CHENNAI" }, { "empno": "9369", "ename": "SMITH", "designation": "CLERK", "manager": "7902", "hire_date": "12/17/1980", "sal": "800", "deptno": "20", "location": "KOLKATA" } ]
We have sample data in a JSON file available at the storage account path
Step 1: Load JSON File in Dataframe
In this step, we will load the JSON file in a Spark Dataframe using the below code:
%scala val jsonDf = spark.read.option("multiline", "true").json("/FileStore/tables/emp_data1.json") display(jsonDf)
Here, we have used Option to handle multiline JSON.
Step 2: Create Delta Table from Dataframe
Once we loaded the JSON data into Dataframe, we can create a delta table using the below command:
jsonDf.write.mode("overwrite").format("delta").saveAsTable("testdb.jsonDataTable")
Data Validation
Let’s check the table whether data was loaded correctly in the delta table or not:
You can also check the table structure using below query in SQL:
%sql show create table testdb.jsondatatable;
Wrapping Up
In this post, we have learned how to read JSON data in a dataframe and then create a Delta table to store the data.