Spark Scenario based Interview Questions with Answers

Q.1 There is a json file with following content :-

{“dept_id”:101,”e_id”:[10101,10102,10103]}

{“dept_id”:102,”e_id”:[10201,10202]}

And data is loaded into spark dataframe say mydf, having below dtypes

dept_id: bigint, e_id: array<bigint>

What will be the best way to get the e_id individually with dept_id ?

Answer :

we can use the explode function , which will explode as per the number of items in e_id .

The code would be like

mydf.withColum(“e_id”,explode($”e_id”)).

Here we have taken the new column same as old column, the dtypes of opdf will be

dept_id: bigint, e_id:bigint

So output would look like

+———+——-+

| dept_id| e_id |

+———+——-+

| 101 | 10101|

| 101 | 10102|

| 101 | 10103|

| 102 | 10201|

| 102 | 10202|

+———-+——+

Q2 . How many number of column will be present in the df2, if df1 have three columns a1,a2,a3

Var df2=df.withColumn(“b1”,lit(“a1”)).withColumn(“a1”,lit(“a2”)).withColumn(“a2”,$“a2”).withColumn(“b2”,$”a3”)).withColumn(“a3”,lit(“b1”))

Answer :

Total 5 As below

Df // a1,a2,a3

df.withColumn(“b1”,lit(“a1”)) //a1,a2,a3,b1

.withColumn(“a1”,lit(“a2”)) //a1,a2,a3,b1

.withColumn(“a2”,$“a2”) //a1,a2,a3,b1

.withColumn(“b2”,$”a3”))//a1,a2,a3,b1,b2

.withColumn(“a3”,lit(“b1”))//a1,a2,a3,b1,b2

Q3 . How to get RDD with its element indices.

Say myrdd = (a1,b1,c1,s2,s5)

Output should be

((a1,0),(b1,1),(c1,2),(s2,3),(s5,4))

Answer :

we can use zipWithIndex function

var myrdd_windx = myrdd.zipWithIndex()

For more Interview Questions visit here
For any coding help in Big Data ask to our expert here

Spark Scenario based Interview Questions with Answers – 2

Q.1 There is a json file with following content :-

Answer :

Q2 . How many number of column will be present in the df2, if df1 have three columns a1,a2,a3

Answer :

Q3 . How to get RDD with its element indices.

Answer :

Leave a Reply Cancel reply

Hive Most Asked Interview Questions With Answers – Part I

Spark Interview Questions Part-1

Hive Most Asked Interview Questions With Answers – Part II

Hive Scenario Based Interview Questions with Answers

Scenario based interview questions on Big Data

Spark Scenario based Interview Questions

Spark Interview Questions – Part 2

Spark Scenario based Interview Questions with Answers – 2

Kafka Interview Questions

Top 35 data engineer interview questions and answers – All in one

Big Data Engineering Interview Questions

Certifications

Top Machine Learning Courses You Shouldn’t Miss

Top courses for data engineers

Top Big Data Courses on Udemy You should Take

Spark Scenario based Interview Questions with Answers – 2

Q.1 There is a json file with following content :-

Answer :

Q2 . How many number of column will be present in the df2, if df1 have three columns a1,a2,a3

Answer :

Q3 . How to get RDD with its element indices.

Answer :

Leave a Reply Cancel reply

Tags