Requirement

In the last post, we have imported the CSV file and created a delta table in Databricks. In this post, we are extending the same exercise with the Partition. Partitioning is a very important concept when it comes to storing the data in tables. In this post, we are going to create a Delta table with Partition from the CSV file in Databricks.

If you want to know more about the Partition, you can read Partitioning in Hive post.

Solution

Let’s use the same sample data:

empno	ename	designation	manager	hire_date	sal	deptno	location
9369	SMITH	CLERK	7902	12/17/1980	800	20	BANGALORE
9499	ALLEN	SALESMAN	7698	2/20/1981	1600	30	HYDERABAD
9521	WARD	SALESMAN	7698	2/22/1981	1250	30	PUNE
9566	TURNER	MANAGER	7839	4/2/1981	2975	20	MUMBAI
9654	MARTIN	SALESMAN	7698	9/28/1981	1250	30	CHENNAI
9369	SMITH	CLERK	7902	12/17/1980	800	20	KOLKATA

Step 1: Load CSV in Dataframe

First of all, we have to read the data from the CSV file. Here is the code for the same:

%scala
val file_location = "/FileStore/tables/emp_data1-3.csv"

val df = spark.read.format("csv")
  .option("inferSchema", "true")
  .option("header", "true")
  .option("sep", ",")
  .load(file_location)

display(df)

Step 2: Create Delta Table with Partition

Once the data is available in the dataframe, we can write the data directly in any existing or new table.

 df.write.format("delta").partitionBy("location").saveAsTable(testdb.emp_partition_tbl)

Here, we have saved the data with delta format and partitioned by location.

Step 3: Check Partition in Table

Use the below command to check the partitions in the table:

%sql
show partitions testdb.emp_partition_tbl

Wrapping Up

In this post, we have learned how to store dataframe data into a delta partition table. This demonstration was with the only partition.

Create Delta Table with Partition from CSV File in Databricks