Write DataFrame to Delta Table in Databricks with Overwrite Mode

Requirement

In this post, we will learn how to store the processed dataframe to delta table in databricks with overwrite mode. The overwrite mode delete the existing data of the table and load only new records.

Solution

For this exercise, we will use the below data:

empno	ename	designation	manager	hire_date	sal	deptno	location
9369	SMITH	CLERK	7902	12/17/1980	800	20	BANGALORE
9499	ALLEN	SALESMAN	7698	2/20/1981	1600	30	HYDERABAD
9521	WARD	SALESMAN	7698	2/22/1981	1250	30	PUNE
9566	TURNER	MANAGER	7839	4/2/1981	2975	20	MUMBAI
9654	MARTIN	SALESMAN	7698	9/28/1981	1250	30	CHENNAI
9369	SMITH	CLERK	7902	12/17/1980	800	20	KOLKATA

First, load this data into a dataframe using the below code:

 val file_location = "/FileStore/tables/emp_data1-3.csv"

val df = spark.read.format("csv")
  .option("inferSchema", "true")
  .option("header", "true")
  .option("sep", ",")
  .load(file_location)

display(df)

 df.write.mode("overwrite").format("delta").saveAsTable(permanent_table_name)

Data Validation

When you query the table, it will return only 6 records even after rerunning the code because we are overwriting the data in the table. Every time, this table will have the latest records.

Wrapping Up

In this post, we have stored the dataframe data into a delta table with overwrite mode that means the existing data in the table is deleted and then new data is inserted.

Previous Post: Write DataFrame to Delta Table in Databricks with Append Mode

Next Post: Query Delta Data in Databricks

Leave a Reply Cancel reply

You must be logged in to post a comment.