Get value from a Row in Spark

Requirement

In this post, we will learn how to get or extract a value from a row. Whenever we extract a value from a row of a column, we get an object as a result.

For example, if we have a data frame with personal details like id, name, location, etc. If we try to get the max of id or a person name with any filter, we get an object result like:

+——-+

|max(id)|

+——-+

|   1500|

+——-+

Solution

Creating a data frame with some sample data:

 val df = spark.createDataFrame(Seq(
(1100, "Person1", "Location1", null),
(1200, "Person2", "Location2", "Contact2"),
(1300, "Person3", "Location3", null),
(1400, "Person4", null, "Contact4"),
(1500, "Person5", "Location4", null)
)).toDF("id", "name", "location", "contact")

# Get max ID from the Data frame

 val maxId = df.agg(max(df("id")))

If you see, we are getting results in a data frame. But we want a variable as a value to print or use the value:

 println(maxId.first.getInt(0))

Here, FIRST will return a Row object and then have used getInt with index 0 to the value.

# Get a specific person name by Id

 val name = df.select($"name").filter($"id" === 1200)
name.first.getString(0)

Note: Instead of FIRST, you can also use HEAD.

Wrapping Up

This is a very common use case all the time. You can also explore some other types of data.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply