Requirement
Let’s say you have DELTA tables in databricks, which point to data available in the data lake in Delta format. You now need to do some analysis on this data but from a different databricks workspace. Depending on your requirements, you can create a table in the newly created databricks workspace and point to the data, or you can directly query the DELTA data without having to create a table schema first. As part of this post, we will learn how to query directly the DELTA data.
Solution
Let’s say we have a DELTA file that includes users’ basic information, including their names, contact information, and location. This dataset is available in data lakes in the DELTA format.
The first row is showing _delta_log, which keeps data versioning, and the rest of the rows are showing snappy.parquet data.
We will read this Delta data directly using the SELECT query without creating a table. We can use the below SQL command to read and analyze this data.
%sql select * from delta.`/mnt/blob-storage/testDeltaTable2/`
Wrapping Up
In this post, we learned how to query delta data in databricks. You can use this scenario if you have a delta lake path in the data lake. You don’t need to create any tables in order to query.