Requirement

In this post, we are going to learn how to check if Dataframe is Empty in Spark. This is a very important part of the development as this condition actually decides whether the transformation logic will execute on the Dataframe or not.

Solution

Let’s first understand how this can cause you and why it is important to check empty.

When you are reading/loading data into a dataframe, and if there is no data, it’s better not to process. If you don’t check, it is not worth running multiple transformations and actions on this as it is running on empty data.

First, create an empty dataframe:

There are multiple ways to check if Dataframe is Empty. Most of the time, people use count action to check if the dataframe has any records.

Approach 1: Using Count

Approach 2: Using head and isEmpty

Approach 3: Using take and isEmpty

Approach 4: Convert to RDD and isEmpty

Full Code Snippet

val df = spark.emptyDataFrame
// Approach 1: Using Count
if(df.count() > 1)
  println("DF is not Empty")
else
  println("DF is Empty")

// Approach 2: Using head and isEmpty
if(df.head(1).isEmpty)
  println("DF is not Empty")
else
  println("DF is Empty")

// Approach 3: Using take and isEmpty
if(df.take(1).isEmpty)
  println("DF is not Empty")
else
  println("DF is Empty")

// Approach 4: Convert to RDD and isEmpty
if(df.rdd.isEmpty)
  println("DF is not Empty")
else
  println("DF is Empty")

Wrapping Up

In this post, we have leant to check if dataframe is empty or not. This can be perform by many ways, but need to pick based on the performance wise.

Check If DataFrame is Empty in Spark