Write DataFrame to Delta Table in Databricks with Overwrite Mode

Requirement

In this post, we will learn how to store the processed dataframe to delta table in databricks with overwrite mode. The overwrite mode delete the existing data of the table and load only new records.

Solution

For this exercise, we will use the below data:

empno

ename

designation

manager

hire_date

sal

deptno

location

9369

SMITH

CLERK

7902

12/17/1980

800

20

BANGALORE

9499

ALLEN

SALESMAN

7698

2/20/1981

1600

30

HYDERABAD

9521

WARD

SALESMAN

7698

2/22/1981

1250

30

PUNE

9566

TURNER

MANAGER

7839

4/2/1981

2975

20

MUMBAI

9654

MARTIN

SALESMAN

7698

9/28/1981

1250

30

CHENNAI

9369

SMITH

CLERK

7902

12/17/1980

800

20

KOLKATA

First, load this data into a dataframe using the below code:

 val file_location = "/FileStore/tables/emp_data1-3.csv"

val df = spark.read.format("csv")
  .option("inferSchema", "true")
  .option("header", "true")
  .option("sep", ",")
  .load(file_location)

display(df)

 df.write.mode("overwrite").format("delta").saveAsTable(permanent_table_name)

Data Validation

When you query the table, it will return only 6 records even after rerunning the code because we are overwriting the data in the table. Every time, this table will have the latest records.

Wrapping Up

In this post, we have stored the dataframe data into a delta table with overwrite mode that means the existing data in the table is deleted and then new data is inserted.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply