Requirement: You have sample data of some students and you want to create a dataframe to perform some operations. Given: Sample data:     101, "alex",88.56 102, "john",68.32 103, "peter",75.62 104, "jeff",92.67 105, "mathew",89.56 106, "alan",72.57 107, "steve",96.12 108, "mark",98.45 109, "adam",76.25 109, "david",78.45 Solution: Step 1: go to sparkRead More →

Requirement Let’s say we have a set of data which is in JSON format. The file may contain data either in a single line or in a multi-line. The requirement is to process these data using the Spark data frame. In addition to this, we will also see how toRead More →

Requirement The spark-shell is an environment where we can run the spark scala code and see the output on the console for every execution of line of the code. It is more interactive environment. But, when we have more line of code, we prefer to write in a file andRead More →

Requirement You have two table named as A and B. and you want to perform all types of join in spark using scala. It will help you to understand, how join works in spark scala. Solution Step 1: Input Files Download file  and  from here. And place them into a localRead More →

Requirement : You have marks of all the students of class and you want to find ranks of students using scala. Given : A pipe separated file which contains roll number and marks of students : below are the sample values :- R_no marks 101 389 102 412 103 435Read More →

Requirement Suppose we have a dataset which is in CSV format. We want to read the file in spark using Scala. So the requirement is to create a spark application which read CSV file in spark data frame using Scala. Components Involved Following components are involved: Spark RDD/Data Frame ScalaRead More →

Requirement Suppose we are having a source file, which contains basic information about Employees like employee number, employee name, designation, salary etc. The requirement is to find max value in spark RDD using Scala. With this requirement, we will find out the maximum salary, the second maximum salary of anRead More →

Requirement Suppose we are having a text format data file which contains employees basic details. When we load this file in Spark, it returns an RDD. Our requirement is to find the number of partitions which has created just after loading the data file and see what records are storedRead More →

Requirement In spark-shell, it creates an instance of spark context as sc. Also, we don’t require to resolve dependency while working on spark shell. But it all requires if you move from spark shell to IDE. So how to create spark application in IntelliJ? In this post, we are goingRead More →

Requirement You have marks of all the students of a class with roll number in CSV file, It is needed to calculate the percentage of each student in spark using Scala. Given : Download the sample CSV file  Which have 7 columns, 1st column is Roll no and other 6Read More →

Requirement Assume you have the hive table named as reports. It is required to process this dataset in spark. Once we have data of hive table in spark data frame we can further transform it as per the business needs. So let’s try to load hive table in spark dataRead More →