Requirement
In the last post, we have imported the CSV file and created a delta table in Databricks. In this post, we are extending the same exercise with the Partition. Partitioning is a very important concept when it comes to storing the data in tables. In this post, we are going to create a Delta table with Partition from the CSV file in Databricks.
If you want to know more about the Partition, you can read Partitioning in Hive post.
![]()
Solution
Let’s use the same sample data:
empno | ename | designation | manager | hire_date | sal | deptno | location |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | BANGALORE |
9499 | ALLEN | SALESMAN | 7698 | 2/20/1981 | 1600 | 30 | HYDERABAD |
9521 | WARD | SALESMAN | 7698 | 2/22/1981 | 1250 | 30 | PUNE |
9566 | TURNER | MANAGER | 7839 | 4/2/1981 | 2975 | 20 | MUMBAI |
9654 | MARTIN | SALESMAN | 7698 | 9/28/1981 | 1250 | 30 | CHENNAI |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | KOLKATA |
Step 1: Load CSV in Dataframe
First of all, we have to read the data from the CSV file. Here is the code for the same:
%scala
val file_location = "/FileStore/tables/emp_data1-3.csv"
val df = spark.read.format("csv")
.option("inferSchema", "true")
.option("header", "true")
.option("sep", ",")
.load(file_location)
display(df)![]()
Step 2: Create Delta Table with Partition
Once the data is available in the dataframe, we can write the data directly in any existing or new table.
df.write.format("delta").partitionBy("location").saveAsTable(testdb.emp_partition_tbl)![]()
Here, we have saved the data with delta format and partitioned by location.
Step 3: Check Partition in Table
Use the below command to check the partitions in the table:
%sql show partitions testdb.emp_partition_tbl
![]()
Wrapping Up
In this post, we have learned how to store dataframe data into a delta partition table. This demonstration was with the only partition.