Spark Interview Questions – Part 2

Q1 : What is the use of query.awaitTermination() In structured streaming?

Answer : In batch processing, we generally load the whole data and store the data at once. But in real time streaming, we get data in micro batches mostly based on trigger processing time, hence the streaming query should be running till the Termination by any failure or other task. Hence it should wait till termination and keep on processing the real time data.

Q2 : Spark automatically monitors cache usage on each node and drops out old data partitions. What is the manual way of doing it ?

Answer : RDD.unpersist() method will delete the cached data.

Q3 : in Spark SQL ,what would be the output of below :

SELECT true <=> NULL;

Answer : false

Q4 : IS it possible to load images in spark dataframe ?

Answer Yes using below command

spark.read.format(“image”).load(“path of image”)

It will have a fixed set of columns ,Out of which data is stored is binary format.

Q5. In your spark application , there is dependency of one class which is present in a jar file (abcd.jar) , Somewhere in your cluster . You don’t have a fat jar for your application , How would you use it ?

Answer : while submitting a spark job using spark-submit , we can pass that abcd.jar with –jar , In that way it would be available everywhere ,and our main app can use it. 

And if we want to use it in spark-shell ,then we can load it using 

: load abcd.jar

Read More Interview Questions here

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply