Requirement The spark-shell is an environment where we can run the spark scala code and see the output on the console for every execution of line of the code. It is more interactive environment. But, when we have more line of code, we prefer to write in a file andRead More →

Requirement You have two table named as A and B. and you want to perform all types of join in spark using scala. It will help you to understand, how join works in spark scala. Solution Step 1: Input Files Download file  and  from here. And place them into a localRead More →

Requirement : You have marks of all the students of class and you want to find ranks of students using python. Given : A pipe separated file which contains roll number and marks of students : below are the sample values :- R_no marks 101 389 102 412 103 435Read More →

Requirement Suppose we have a dataset which is in CSV format. We want to read the file in spark using Scala. So the requirement is to create a spark application which read CSV file in spark data frame using Scala. Components Involved Following components are involved: Spark RDD/Data Frame ScalaRead More →

Requirement Suppose we are having a source file, which contains basic information about Employees like employee number, employee name, designation, salary etc. The requirement is to find max value in spark RDD using Scala. With this requirement, we will find out the maximum salary, the second maximum salary of anRead More →

Requirement Suppose we are having a text format data file which contains employees basic details. When we load this file in Spark, it returns an RDD. Our requirement is to find the number of partitions which has created just after loading the data file and see what records are storedRead More →

Requirement In spark-shell, it creates an instance of spark context as sc. Also, we don’t require to resolve dependency while working on spark shell. But it all requires if you move from spark shell to IDE. So how to create spark application in IntelliJ? In this post, we are goingRead More →

Requirement You have marks of all the students of a class with roll number in CSV file, It is needed to calculate the percentage of each student in spark using Scala. Given : Download the sample CSV file  Which have 7 columns, 1st column is Roll no and other 6Read More →

Requirement Assume you have the hive table named as reports. It is required to process this dataset in spark. Once we have data of hive table in spark data frame we can further transform it as per the business needs. So let’s try to load hive table in spark dataRead More →

Requirement Suppose the source data is in a file. The file format is a text format. The requirement is to load the text file into hive table using Spark. In addition to this, read the data from the hive table using Spark. Therefore, let’s break the task into sub-task: LoadRead More →

Requirement In the last post, we have demonstrated how to load JSON data in Hive non-partitioned table. This time having the same sample JSON data. The requirement is to load JSON Data into Hive Partitioned table using Spark. The hive table will be partitioned by some column(s). The below taskRead More →

Requirement Suppose there is a source data which is in JSON format. The requirement is to load JSON data in Hive non-partitioned table using Spark. Let’s break the requirement into two task: Load JSON data in spark data frame and read it Store it in a hive non-partition table ComponentsRead More →