Create Delta Table with Partition from CSV File in Databricks

Requirement

In the last post, we have imported the CSV file and created a delta table in Databricks. In this post, we are extending the same exercise with the Partition. Partitioning is a very important concept when it comes to storing the data in tables. In this post, we are going to create a Delta table with Partition from the CSV file in Databricks.

If you want to know more about the Partition, you can read Partitioning in Hive post.

Solution

Let’s use the same sample data:

empno

ename

designation

manager

hire_date

sal

deptno

location

9369

SMITH

CLERK

7902

12/17/1980

800

20

BANGALORE

9499

ALLEN

SALESMAN

7698

2/20/1981

1600

30

HYDERABAD

9521

WARD

SALESMAN

7698

2/22/1981

1250

30

PUNE

9566

TURNER

MANAGER

7839

4/2/1981

2975

20

MUMBAI

9654

MARTIN

SALESMAN

7698

9/28/1981

1250

30

CHENNAI

9369

SMITH

CLERK

7902

12/17/1980

800

20

KOLKATA

Step 1: Load CSV in Dataframe

First of all, we have to read the data from the CSV file. Here is the code for the same:

%scala
val file_location = "/FileStore/tables/emp_data1-3.csv"

val df = spark.read.format("csv")
  .option("inferSchema", "true")
  .option("header", "true")
  .option("sep", ",")
  .load(file_location)

display(df)

Step 2: Create Delta Table with Partition

Once the data is available in the dataframe, we can write the data directly in any existing or new table.

 df.write.format("delta").partitionBy("location").saveAsTable(testdb.emp_partition_tbl)

Here, we have saved the data with delta format and partitioned by location. 

Step 3: Check Partition in Table

Use the below command to check the partitions in the table:

%sql
show partitions testdb.emp_partition_tbl

Wrapping Up

In this post, we have learned how to store dataframe data into a delta partition table. This demonstration was with the only partition.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply