Convert RDD to Dataframe in Spark

Requirement

In this post, we will convert RDD to Dataframe in Spark with Scala.

Solution

Approach 1: Using Schema Struct Type

 //Create RDD:
val dummyRDD = sc.parallelize(Seq(
                             ("1001", "Ename1", "Designation1", "Manager1")
                            ,("1003", "Ename2", "Designation2", "Manager2")
                            ,("1001", "Ename3", "Designation3", "Manager3")
                             ))

val schema = StructType(
    StructField("empno", StringType, true) ::
    StructField("ename", StringType, true) ::
    StructField("designation", StringType, true) :: 
    StructField("manager", StringType, true) :: Nil
)

// Here converting into Row
val dummyRDD2 = dummyRDD.map(data => Row(Row(data._1.toString, data._2.toString, data._3.toString, data._4.toString)))
val df1 = spark.createDataFrame(dummyRDD2, schema)

Approach 2: Using Seq of column

val schemaSeq = Seq("empno", "ename", "designation", "manager")
val df2 = dummyRDD.toDF(schemaSeq: _*)

Wrapping Up

In this post, we have learned the different approaches to convert RDD into Dataframe in Spark. Here, in the function approaches, we have converted the string to Row, whereas in the Seq approach this step was not required.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply