Requirement
In the last post, we have imported the CSV file and created a delta table in Databricks. In this post, we are extending the same exercise with the Partition. Partitioning is a very important concept when it comes to storing the data in tables. In this post, we are going to create a Delta table with Partition from the CSV file in Databricks.
If you want to know more about the Partition, you can read Partitioning in Hive post.
Solution
Let’s use the same sample data:
empno | ename | designation | manager | hire_date | sal | deptno | location |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | BANGALORE |
9499 | ALLEN | SALESMAN | 7698 | 2/20/1981 | 1600 | 30 | HYDERABAD |
9521 | WARD | SALESMAN | 7698 | 2/22/1981 | 1250 | 30 | PUNE |
9566 | TURNER | MANAGER | 7839 | 4/2/1981 | 2975 | 20 | MUMBAI |
9654 | MARTIN | SALESMAN | 7698 | 9/28/1981 | 1250 | 30 | CHENNAI |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | KOLKATA |
Step 1: Load CSV in Dataframe
First of all, we have to read the data from the CSV file. Here is the code for the same:
%scala val file_location = "/FileStore/tables/emp_data1-3.csv" val df = spark.read.format("csv") .option("inferSchema", "true") .option("header", "true") .option("sep", ",") .load(file_location) display(df)
Step 2: Create Delta Table with Partition
Once the data is available in the dataframe, we can write the data directly in any existing or new table.
df.write.format("delta").partitionBy("location").saveAsTable(testdb.emp_partition_tbl)
Here, we have saved the data with delta format and partitioned by location.
Step 3: Check Partition in Table
Use the below command to check the partitions in the table:
%sql show partitions testdb.emp_partition_tbl
Wrapping Up
In this post, we have learned how to store dataframe data into a delta partition table. This demonstration was with the only partition.