Requirement
In this post, we will learn how to store the processed dataframe to delta table in databricks with overwrite mode. The overwrite mode delete the existing data of the table and load only new records.
Solution
For this exercise, we will use the below data:
empno | ename | designation | manager | hire_date | sal | deptno | location |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | BANGALORE |
9499 | ALLEN | SALESMAN | 7698 | 2/20/1981 | 1600 | 30 | HYDERABAD |
9521 | WARD | SALESMAN | 7698 | 2/22/1981 | 1250 | 30 | PUNE |
9566 | TURNER | MANAGER | 7839 | 4/2/1981 | 2975 | 20 | MUMBAI |
9654 | MARTIN | SALESMAN | 7698 | 9/28/1981 | 1250 | 30 | CHENNAI |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | KOLKATA |
First, load this data into a dataframe using the below code:
val file_location = "/FileStore/tables/emp_data1-3.csv" val df = spark.read.format("csv") .option("inferSchema", "true") .option("header", "true") .option("sep", ",") .load(file_location) display(df)
df.write.mode("overwrite").format("delta").saveAsTable(permanent_table_name)
Data Validation
When you query the table, it will return only 6 records even after rerunning the code because we are overwriting the data in the table. Every time, this table will have the latest records.
Wrapping Up
In this post, we have stored the dataframe data into a delta table with overwrite mode that means the existing data in the table is deleted and then new data is inserted.