Create Delta Table from JSON File in Databricks

Requirement

In this post, we are going to read a JSON file using Spark and then load it into a Delta table in Databricks.

Solution

We can use the below sample data for the exercise. In our case, we have placed this file is located in FilteStore in the Databricks cluster:

 [
    {
        "empno": "9369",
        "ename": "SMITH",
        "designation": "CLERK",
        "manager": "7902",
        "hire_date": "12/17/1980",
        "sal": "800",
        "deptno": "20",
        "location": "BANGALORE"
    },
    {
        "empno": "9499",
        "ename": "ALLEN",
        "designation": "SALESMAN",
        "manager": "7698",
        "hire_date": "2/20/1981",
        "sal": "1600",
        "deptno": "30",
        "location": "HYDERABAD"
    },
    {
        "empno": "9521",
        "ename": "WARD",
        "designation": "SALESMAN",
        "manager": "7698",
        "hire_date": "2/22/1981",
        "sal": "1250",
        "deptno": "30",
        "location": "PUNE"
    },
    {
        "empno": "9566",
        "ename": "TURNER",
        "designation": "MANAGER",
        "manager": "7839",
        "hire_date": "4/2/1981",
        "sal": "2975",
        "deptno": "20",
        "location": "MUMBAI"
    },
    {
        "empno": "9654",
        "ename": "MARTIN",
        "designation": "SALESMAN",
        "manager": "7698",
        "hire_date": "9/28/1981",
        "sal": "1250",
        "deptno": "30",
        "location": "CHENNAI"
    },
    {
        "empno": "9369",
        "ename": "SMITH",
        "designation": "CLERK",
        "manager": "7902",
        "hire_date": "12/17/1980",
        "sal": "800",
        "deptno": "20",
        "location": "KOLKATA"
    }
]

We have sample data in a JSON file available at the storage account path

Step 1: Load JSON File in Dataframe

In this step, we will load the JSON file in a Spark Dataframe using the below code:

 %scala
val jsonDf = spark.read.option("multiline", "true").json("/FileStore/tables/emp_data1.json")
display(jsonDf)

Here, we have used Option to handle multiline JSON.

Step 2: Create Delta Table from Dataframe

Once we loaded the JSON data into Dataframe, we can create a delta table using the below command:

 jsonDf.write.mode("overwrite").format("delta").saveAsTable("testdb.jsonDataTable")

Data Validation

Let’s check the table whether data was loaded correctly in the delta table or not:

You can also check the table structure using below query in SQL:

 %sql
show create table testdb.jsondatatable;

Wrapping Up

In this post, we have learned how to read JSON data in a dataframe and then create a Delta table to store the data.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply