Requirement
In this post, we will convert RDD to Dataframe in Spark with Scala.
Solution
Approach 1: Using Schema Struct Type
//Create RDD: val dummyRDD = sc.parallelize(Seq( ("1001", "Ename1", "Designation1", "Manager1") ,("1003", "Ename2", "Designation2", "Manager2") ,("1001", "Ename3", "Designation3", "Manager3") )) val schema = StructType( StructField("empno", StringType, true) :: StructField("ename", StringType, true) :: StructField("designation", StringType, true) :: StructField("manager", StringType, true) :: Nil ) // Here converting into Row val dummyRDD2 = dummyRDD.map(data => Row(Row(data._1.toString, data._2.toString, data._3.toString, data._4.toString))) val df1 = spark.createDataFrame(dummyRDD2, schema)
Approach 2: Using Seq of column
val schemaSeq = Seq("empno", "ename", "designation", "manager") val df2 = dummyRDD.toDF(schemaSeq: _*)
Wrapping Up
In this post, we have learned the different approaches to convert RDD into Dataframe in Spark. Here, in the function approaches, we have converted the string to Row, whereas in the Seq approach this step was not required.