Requirement
In this post, we will learn how to get or extract a value from a row. Whenever we extract a value from a row of a column, we get an object as a result.
For example, if we have a data frame with personal details like id, name, location, etc. If we try to get the max of id or a person name with any filter, we get an object result like:
+——-+
|max(id)|
+——-+
| 1500|
+——-+
Solution
Creating a data frame with some sample data:
val df = spark.createDataFrame(Seq( (1100, "Person1", "Location1", null), (1200, "Person2", "Location2", "Contact2"), (1300, "Person3", "Location3", null), (1400, "Person4", null, "Contact4"), (1500, "Person5", "Location4", null) )).toDF("id", "name", "location", "contact")
# Get max ID from the Data frame
val maxId = df.agg(max(df("id")))
If you see, we are getting results in a data frame. But we want a variable as a value to print or use the value:
println(maxId.first.getInt(0))
Here, FIRST will return a Row object and then have used getInt with index 0 to the value.
# Get a specific person name by Id
val name = df.select($"name").filter($"id" === 1200) name.first.getString(0)
Note: Instead of FIRST, you can also use HEAD.
Wrapping Up
This is a very common use case all the time. You can also explore some other types of data.