Create an Empty DataFrame in Spark

In: spark with scala

Requirement

In this post, we are going to learn how to create an empty dataframe in Spark with and without schema.

Prerequisite

Spark 2.x or above

Solution

We will see create an empty DataFrame with different approaches:

PART I: Empty DataFrame with Schema

Approach 1:Using createDataFrame Function

import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}
import org.apache.spark.sql._

val schema = StructType(
    StructField("empno", StringType, true) ::
    StructField("ename", StringType, true) ::
    StructField("designation", StringType, true) :: 
    StructField("manager", StringType, true) :: Nil
)

// Create Empty DataFrame using schema
val emptyDF = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)

Approach 2: Using Case Class

case class Employee(empno: String, ename: String, designation: String, manager: String)

//Create Empty DataFrame using Case Class
val emptyDF2 = Seq.empty[Employee].toDF()

Approach 3: Using Sequence

 val schemaSeq = Seq("empno", "ename", "designation", "manager")

//Create Empty DataFrame using Seq
val emptyDF3 = Seq.empty[(String, String, String, String)].toDF(schemaSeq: _*)

Part II: Empty DataFrame without Schema

 val emptyDF4 = spark.emptyDataFrame

Wrapping Up

In this post, we have learned the different approaches to create an empty DataFrame in Spark with schema and without schema. We use the schema in case the schema of the data already known, we can use it without schema for dynamic data i.e. when the schema is unknown.

Previous Post: Kafka Interview Questions

Next Post: Print RDD content in Spark

Leave a Reply Cancel reply

You must be logged in to post a comment.