SELECT in Spark DataFrame

Requirement

In this post, we will learn how to select a specific column value or all the columns in Spark DataFrame with different approaches.

Sample Data

empno

ename

designation

manager

hire_date

sal

deptno

location

9369

SMITH

CLERK

7902

12/17/1980

800

20

BANGALORE

9499

ALLEN

SALESMAN

7698

2/20/1981

1600

30

HYDERABAD

9521

WARD

SALESMAN

7698

2/22/1981

1250

30

PUNE

9566

TURNER

MANAGER

7839

4/2/1981

2975

20

MUMBAI

9654

MARTIN

SALESMAN

7698

9/28/1981

1250

30

CHENNAI

9369

SMITH

CLERK

7902

12/17/1980

800

20

KOLKATA

Solution

Create Dataframe with sample data

val empDf = spark.read.option("header", "true")
                      .option("inferSchema", "true")
                      .csv("/Users/dipak_shaw/bdp/data/emp_data1.csv")

Select a specific column

Using COL function

 empDf.select(col("ename")).show

Using “$” expression

empDf.select($"ename").show

Select multiple columns

using COL function

 empDf.select(col("empno"), col("ename")).show

Using “$” expression

 empDf.select(col("empno"), col("ename")).show

Using “*” expression

 empDf.select($"empno", $"ename").show

Using head & tail

import spark.implicits._
import org.apache.spark.sql.functions._
val cols = empDf.columns.toSeq
empDf.select(cols.head, cols.tail:_*)

Wrapping Up

In this post, we have learned how to fetch either a specific or multiple columns values from a dataframe using COL function or $ expression in SELECT.

You can check the post related to SELECTExpr here.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply