Requirement

In the last post, we have learned how to create Delta Table from Path in Databricks. In this post, we will learn how to create Delta Table with the partition in Databricks.

Solution

The partition is basically split the data and then stored. You can learn more about the Partition here https://bigdataprogrammers.com/partition-in-hive/.

Create Table with Partition

For creating a Delta table, below is the template:

 CREATE TABLE <table_name> (
<column name> <data type>,
<column name> <data type>,
..)
Partition By (
      <partition_column name> <data type>
)
USING DELTA
Location '<Path of the data>';

With the same template, let’s create a table for the below sample data:

Sample Data

empno	ename	designation	manager	hire_date	sal	deptno	location
9369	SMITH	CLERK	7902	12/17/1980	800	20	BANGALORE
9499	ALLEN	SALESMAN	7698	2/20/1981	1600	30	HYDERABAD
9521	WARD	SALESMAN	7698	2/22/1981	1250	30	PUNE
9566	TURNER	MANAGER	7839	4/2/1981	2975	20	MUMBAI
9654	MARTIN	SALESMAN	7698	9/28/1981	1250	30	CHENNAI
9369	SMITH	CLERK	7902	12/17/1980	800	20	KOLKATA

 CREATE TABLE employee_delta (
      empno INT,
      ename STRING,
      manager INT,
      hire_date DATE,
      sal BIGINT,
      deptno INT,
      location STRING
) PARTITION BY (
      designation STRING
)
USING DELTA
Location '/mnt/bdpdatalake/blob-storage/';

Here, we have created the table with partition by Designation. There will be multiple subfolders created under the Location path with the name like CLEAR, SALESMAN.

Wrapping Up

In this post, we have learned how to create a Delta table with a partition. The partition is useful when we have huge data against the partition column value, The processing will be faster using the partition. It is also important to understand the scenarios, where to use the partition or not.

Create Delta Table with Partition in Databricks

Requirement

Solution

Wrapping Up

Leave a Reply Cancel reply

Tags