Requirement Suppose we are having a text format data file which contains employees basic details. When we load this file in Spark, it returns an RDD. Our requirement is to find the number of partitions which has created just after loading the data file and see what records are storedRead More →

Requirement In real time scenario, data files contain many records. Also, there may be many data files available. In that case, it’s good to find a suitable approach to find out the output. Here, we want total number of records available in data files. So the requirement is to howRead More →

Requirement Suppose you get data files which are having user’s basic information like first name, last name, designation, city etc. These basic details are separated by ‘,’ delimiter. Now, the requirement has come to find out all the duplicate value of any field of information. So, here the requirement isRead More →

Requirement Suppose you have a file with full of contents. In this file, many words are repeatable. Now the requirement is how to get distinct words from the file using Map Reduce. If you compare with the SQL, then we have to write a map reduce program which is similarRead More →