Get column value from Data Frame as list in Spark

Requirement

In this post, we are going to extract or get column value from Data Frame as List in Spark. Let’s take an example, you have a data frame with some schema and would like to get a list of values of a column for any further process.

Solution

Let’s create a data frame with some dummy data.

 val df = spark.createDataFrame(Seq(
(1100, "Person1", "Location1", null),
(1200, "Person2", "Location2", "Contact2"),
(1300, "Person3", "Location3", null),
(1400, "Person4", null, "Contact4"),
(1500, "Person5", "Location4", null)
)).toDF("id", "name", "location", "contact")

Here, we have 4 columns ID, Name, Location, Contact.

Explore Column Value

df.select("name")
df.select("name").rdd.map(r => r(0)).collect()

The first line of code will return in String, whereas 2nd line of code will return an Array of String Data Type.

Wrapping Up

We have performed on the String data type and return as Array of String. We can also define data type using asInstanceOf[Data_Type] in map (r => r(0).asInstanceOf[Int]).collect().

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply