Requirement
In this post, we are going to learn how to create an empty dataframe in Spark with and without schema.
Prerequisite
- Spark 2.x or above
Solution
We will see create an empty DataFrame with different approaches:
PART I: Empty DataFrame with Schema
Approach 1:Using createDataFrame Function
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType} import org.apache.spark.sql._ val schema = StructType( StructField("empno", StringType, true) :: StructField("ename", StringType, true) :: StructField("designation", StringType, true) :: StructField("manager", StringType, true) :: Nil ) // Create Empty DataFrame using schema val emptyDF = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)
Approach 2: Using Case Class
case class Employee(empno: String, ename: String, designation: String, manager: String) //Create Empty DataFrame using Case Class val emptyDF2 = Seq.empty[Employee].toDF()
Approach 3: Using Sequence
val schemaSeq = Seq("empno", "ename", "designation", "manager") //Create Empty DataFrame using Seq val emptyDF3 = Seq.empty[(String, String, String, String)].toDF(schemaSeq: _*)
Part II: Empty DataFrame without Schema
val emptyDF4 = spark.emptyDataFrame
Wrapping Up
In this post, we have learned the different approaches to create an empty DataFrame in Spark with schema and without schema. We use the schema in case the schema of the data already known, we can use it without schema for dynamic data i.e. when the schema is unknown.