Big Data Engineering Interview Questions

1. If there is a csv file present in hdfs location which has a header . while reading it in to spark, which property needs to be set

Answer : while reading in dataframe we need to set an option of header to true like below :-

 var df1=spark.read.option("header",true).csv("path")

 

2. There is one csv file and you want to load it into spark dataframe. You do not want spark to inferSchema for that csv file and you have a custom schema based on your requirement , How would you create a custom schema and assign it to dataframe ?

Answer : To create a schema we need to have one struct type and then add schema using .add as per our fields :

 val my_csv_schema = new StructType()
.add("id",IntegerType,true)
.add("sal",DoubleType,true)
.add("name",StringType,true)

Once the schema is created we can pass it using .schema

   var emp_data = spark.read.format("csv")
.option("header", "true")
.schema(my_csv_schema).load("path_to_csv")

 

3. In hbase how to check whether table exist or not ?

Answer:  we can use command to check the existence of table 

 exists 'Table_Name'

 

4. What is pre splitting of Hbase table ? explain .

Answer : refer this post  https://bigdataprogrammers.com/pre-splitting-of-hbase-table/

 

5. There is a dataframe df1, how would you cast all the columns of the dataframe to string , There should not be any hardcode values?

Answer : we can have a list of columns and alter the list to select column as below

var all_cols=df1.columns
var all_cols_cast=all_cols.map(x => x.cast(“string”))
var df1_new=df1.select(all_cols_cast:_*)

df1_new will have all the columns with data type string. This is the generic way of doing it .

If you need any help while coding and learning spark, connect with our experts here https://bigdataprogrammers.com/get-help-from-big-data-expert/

 

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply