Requirement
In this post, we will learn how to select a specific column value or all the columns in Spark DataFrame with different approaches.
Sample Data
empno | ename | designation | manager | hire_date | sal | deptno | location |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | BANGALORE |
9499 | ALLEN | SALESMAN | 7698 | 2/20/1981 | 1600 | 30 | HYDERABAD |
9521 | WARD | SALESMAN | 7698 | 2/22/1981 | 1250 | 30 | PUNE |
9566 | TURNER | MANAGER | 7839 | 4/2/1981 | 2975 | 20 | MUMBAI |
9654 | MARTIN | SALESMAN | 7698 | 9/28/1981 | 1250 | 30 | CHENNAI |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | KOLKATA |
Solution
Create Dataframe with sample data
val empDf = spark.read.option("header", "true") .option("inferSchema", "true") .csv("/Users/dipak_shaw/bdp/data/emp_data1.csv")
Select a specific column
Using COL function
empDf.select(col("ename")).show
Using “$” expression
empDf.select($"ename").show
Select multiple columns
using COL function
empDf.select(col("empno"), col("ename")).show
Using “$” expression
empDf.select(col("empno"), col("ename")).show
Using “*” expression
empDf.select($"empno", $"ename").show
Using head & tail
import spark.implicits._ import org.apache.spark.sql.functions._ val cols = empDf.columns.toSeq empDf.select(cols.head, cols.tail:_*)
Wrapping Up
In this post, we have learned how to fetch either a specific or multiple columns values from a dataframe using COL function or $ expression in SELECT.
You can check the post related to SELECTExpr here.