Requirement
In the last post, we have learned how to create Delta Table from Path in Databricks. In this post, we will learn how to create Delta Table with the partition in Databricks.
Solution
The partition is basically split the data and then stored. You can learn more about the Partition here https://bigdataprogrammers.com/partition-in-hive/.
Create Table with Partition
For creating a Delta table, below is the template:
CREATE TABLE <table_name> ( <column name> <data type>, <column name> <data type>, ..) Partition By ( <partition_column name> <data type> ) USING DELTA Location '<Path of the data>';
With the same template, let’s create a table for the below sample data:
Sample Data
empno | ename | designation | manager | hire_date | sal | deptno | location |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | BANGALORE |
9499 | ALLEN | SALESMAN | 7698 | 2/20/1981 | 1600 | 30 | HYDERABAD |
9521 | WARD | SALESMAN | 7698 | 2/22/1981 | 1250 | 30 | PUNE |
9566 | TURNER | MANAGER | 7839 | 4/2/1981 | 2975 | 20 | MUMBAI |
9654 | MARTIN | SALESMAN | 7698 | 9/28/1981 | 1250 | 30 | CHENNAI |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | KOLKATA |
CREATE TABLE employee_delta ( empno INT, ename STRING, manager INT, hire_date DATE, sal BIGINT, deptno INT, location STRING ) PARTITION BY ( designation STRING ) USING DELTA Location '/mnt/bdpdatalake/blob-storage/';
Here, we have created the table with partition by Designation. There will be multiple subfolders created under the Location path with the name like CLEAR, SALESMAN.
Wrapping Up
In this post, we have learned how to create a Delta table with a partition. The partition is useful when we have huge data against the partition column value, The processing will be faster using the partition. It is also important to understand the scenarios, where to use the partition or not.