Q.1 There is a json file with following content :-
{“dept_id”:101,”e_id”:[10101,10102,10103]}
{“dept_id”:102,”e_id”:[10201,10202]}
And data is loaded into spark dataframe say mydf, having below dtypes
dept_id: bigint, e_id: array<bigint>
What will be the best way to get the e_id individually with dept_id ?
Answer :
we can use the explode function , which will explode as per the number of items in e_id .
The code would be like
mydf.withColum(“e_id”,explode($”e_id”)).
Here we have taken the new column same as old column, the dtypes of opdf will be
dept_id: bigint, e_id:bigint
So output would look like
+———+——-+
| dept_id| e_id |
+———+——-+
| 101 | 10101|
| 101 | 10102|
| 101 | 10103|
| 102 | 10201|
| 102 | 10202|
+———-+——+
Q2 . How many number of column will be present in the df2, if df1 have three columns a1,a2,a3
Var df2=df.withColumn(“b1”,lit(“a1”)).withColumn(“a1”,lit(“a2”)).withColumn(“a2”,$“a2”).withColumn(“b2”,$”a3”)).withColumn(“a3”,lit(“b1”))
Answer :
Total 5 As below
Df // a1,a2,a3
df.withColumn(“b1”,lit(“a1”)) //a1,a2,a3,b1
.withColumn(“a1”,lit(“a2”)) //a1,a2,a3,b1
.withColumn(“a2”,$“a2”) //a1,a2,a3,b1
.withColumn(“b2”,$”a3”))//a1,a2,a3,b1,b2
.withColumn(“a3”,lit(“b1”))//a1,a2,a3,b1,b2
Q3 . How to get RDD with its element indices.
Say myrdd = (a1,b1,c1,s2,s5)
Output should be
((a1,0),(b1,1),(c1,2),(s2,3),(s5,4))
Answer :
we can use zipWithIndex function
var myrdd_windx = myrdd.zipWithIndex()
For more Interview Questions visit here
For any coding help in Big Data ask to our expert here