Spark Interview Questions

Q1 : What is the use of query.awaitTermination() In structured streaming?

Answer : In batch processing, we generally load the whole data and store the data at once. But in real time streaming, we get data in micro batches mostly based on trigger processing time, hence the streaming query should be running till the Termination by any failure or other task. Hence it should wait till termination and keep on processing the real time data.

Q2 : Spark automatically monitors cache usage on each node and drops out old data partitions. What is the manual way of doing it ?

Answer : RDD.unpersist() method will delete the cached data.

Q3 : in Spark SQL ,what would be the output of below :

SELECT true <=> NULL;

Answer : false

Q4 : IS it possible to load images in spark dataframe ?

Answer Yes using below command

spark.read.format(“image”).load(“path of image”)

It will have a fixed set of columns ,Out of which data is stored is binary format.

Q5. In your spark application , there is dependency of one class which is present in a jar file (abcd.jar) , Somewhere in your cluster . You don’t have a fat jar for your application , How would you use it ?

Answer : while submitting a spark job using spark-submit , we can pass that abcd.jar with –jar , In that way it would be available everywhere ,and our main app can use it.

And if we want to use it in spark-shell ,then we can load it using

: load abcd.jar

Spark Interview Questions – Part 2

Q1 : What is the use of query.awaitTermination() In structured streaming?

Q2 : Spark automatically monitors cache usage on each node and drops out old data partitions. What is the manual way of doing it ?

Q3 : in Spark SQL ,what would be the output of below :

Q4 : IS it possible to load images in spark dataframe ?

Q5. In your spark application , there is dependency of one class which is present in a jar file (abcd.jar) , Somewhere in your cluster . You don’t have a fat jar for your application , How would you use it ?

Leave a Reply Cancel reply

Hive Most Asked Interview Questions With Answers – Part I

Spark Interview Questions Part-1

Hive Most Asked Interview Questions With Answers – Part II

Hive Scenario Based Interview Questions with Answers

Scenario based interview questions on Big Data

Spark Scenario based Interview Questions

Spark Interview Questions – Part 2

Spark Scenario based Interview Questions with Answers – 2

Kafka Interview Questions

Top 35 data engineer interview questions and answers – All in one

Big Data Engineering Interview Questions

Certifications

Top Machine Learning Courses You Shouldn’t Miss

Top courses for data engineers

Top Big Data Courses on Udemy You should Take

Spark Interview Questions – Part 2

Q1 : What is the use of query.awaitTermination() In structured streaming?

Q2 : Spark automatically monitors cache usage on each node and drops out old data partitions. What is the manual way of doing it ?

Q3 : in Spark SQL ,what would be the output of below :

Q4 : IS it possible to load images in spark dataframe ?

Q5. In your spark application , there is dependency of one class which is present in a jar file (abcd.jar) , Somewhere in your cluster . You don’t have a fat jar for your application , How would you use it ?

Leave a Reply Cancel reply

Tags