Requirement
In this post, we are going to extract or get column value from Data Frame as List in Spark. Let’s take an example, you have a data frame with some schema and would like to get a list of values of a column for any further process.
![]()
Solution
Let’s create a data frame with some dummy data.
val df = spark.createDataFrame(Seq(
(1100, "Person1", "Location1", null),
(1200, "Person2", "Location2", "Contact2"),
(1300, "Person3", "Location3", null),
(1400, "Person4", null, "Contact4"),
(1500, "Person5", "Location4", null)
)).toDF("id", "name", "location", "contact")![]()
Here, we have 4 columns ID, Name, Location, Contact.
Explore Column Value
df.select("name")
df.select("name").rdd.map(r => r(0)).collect()The first line of code will return in String, whereas 2nd line of code will return an Array of String Data Type.
![]()
Wrapping Up
We have performed on the String data type and return as Array of String. We can also define data type using asInstanceOf[Data_Type] in map (r => r(0).asInstanceOf[Int]).collect().