Requirement
In this post, we are going to learn how to check if Dataframe is Empty in Spark. This is a very important part of the development as this condition actually decides whether the transformation logic will execute on the Dataframe or not.
Solution
Let’s first understand how this can cause you and why it is important to check empty.
When you are reading/loading data into a dataframe, and if there is no data, it’s better not to process. If you don’t check, it is not worth running multiple transformations and actions on this as it is running on empty data.
First, create an empty dataframe:
There are multiple ways to check if Dataframe is Empty. Most of the time, people use count action to check if the dataframe has any records.
Approach 1: Using Count
Approach 2: Using head and isEmpty
Approach 3: Using take and isEmpty
Approach 4: Convert to RDD and isEmpty
Full Code Snippet
val df = spark.emptyDataFrame // Approach 1: Using Count if(df.count() > 1) println("DF is not Empty") else println("DF is Empty") // Approach 2: Using head and isEmpty if(df.head(1).isEmpty) println("DF is not Empty") else println("DF is Empty") // Approach 3: Using take and isEmpty if(df.take(1).isEmpty) println("DF is not Empty") else println("DF is Empty") // Approach 4: Convert to RDD and isEmpty if(df.rdd.isEmpty) println("DF is not Empty") else println("DF is Empty")
Wrapping Up
In this post, we have leant to check if dataframe is empty or not. This can be perform by many ways, but need to pick based on the performance wise.