Delta vs Parquet in Databricks

Requirement

In our previous post, we learned how to create Delta tables and Parquet tables. The purpose of this post is to compare Delta vs Parquet Tables.

Solution

Both format tables are helpful. It is all about what your requirement is. Below are the following details about Delta vs Parquet table.

Delta

Parquet

    1. Keep versions of data. So we can get any point in time data. That means we can access an older version of data available in the table.
    1. Don’t keep any data version. It keeps any current version of data. We can not retrieve older versions of data.
    1. Keep transaction logs to track all the commits on the table or delta path. Hence, it creates an additional folder at the data path with the name _delta_log.
    1. Don’t keep any transaction logs. It holds only data files and status file success.
    1. It allows all the ACID transactions – Insert, Update, and Delete on the data.
    1. ACID is not allowed. It is mostly used to keep appended data – Insert only.

These are the features and differences between Delta and Parquet. You can check out an earlier post on the command used to create delta and parquet tables.

Choose Between Delta vs Parquet

We have understood the differences between Delta and Parquet. We are now at the point where we need to choose between these formats. You have to decide based on your needs.

There are several reasons why Delta is preferable:

  • Many Insert, Delete transactions happened on data
  • Update required for your data
  • Want to keep versions of data

It is preferable to use parquet in the following situations:

  • There is only new data being appended
  • Updates are not required

 Although the Delta has many features, it requires a little additional maintenance. Since it keeps versions, it is necessary to clean up the old data version periodically to improve performance. Further, if you are integrating this data with any other data system that is not compatible with delta format, then you will need to convert and use an additional layer.

Wrapping Up

In this post, we have seen differences between Delta vs Parquet. In addition, we have also discussed points on choosing the right format for our requirements. 

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply