Create an Empty DataFrame in Spark

Requirement

In this post, we are going to learn how to create an empty dataframe in Spark with and without schema.

Prerequisite

  • Spark 2.x or above

Solution

We will see create an empty DataFrame with different approaches:

PART I: Empty DataFrame with Schema

Approach 1:Using createDataFrame Function

import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}
import org.apache.spark.sql._

val schema = StructType(
    StructField("empno", StringType, true) ::
    StructField("ename", StringType, true) ::
    StructField("designation", StringType, true) :: 
    StructField("manager", StringType, true) :: Nil
)

// Create Empty DataFrame using schema
val emptyDF = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)

Approach 2: Using Case Class

case class Employee(empno: String, ename: String, designation: String, manager: String)

//Create Empty DataFrame using Case Class
val emptyDF2 = Seq.empty[Employee].toDF()

Approach 3: Using Sequence

 val schemaSeq = Seq("empno", "ename", "designation", "manager")

//Create Empty DataFrame using Seq
val emptyDF3 = Seq.empty[(String, String, String, String)].toDF(schemaSeq: _*)

Part II: Empty DataFrame without Schema

 val emptyDF4 = spark.emptyDataFrame

Wrapping Up

In this post, we have learned the different approaches to create an empty DataFrame in Spark with schema and without schema. We use the schema in case the schema of the data already known, we can use it without schema for dynamic data i.e. when the schema is unknown.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply