Requirement You have two table named as A and B. and you want to perform all types of join in spark using python. It will help you to understand, how join works in pyspark. Solution Step 1: Input Files Download file  and  from here. And place them into a local directory.Read More →

Requirement : You have marks of all the students of class and you want to find ranks of students using python. Given : A pipe separated file which contains roll number and marks of students : below are the sample values :- R_no marks 101 389 102 412 103 435Read More →

Requirement Let’s take a scenario where we have already loaded data into an RDD/Dataframe. We got the rows data into columns and columns data into rows. The requirement is to transpose the data i.e. change rows into columns and columns into rows. Sample Data We will use below sample data.Read More →

Requirement In spark-shell, it creates an instance of spark context as sc. Also, we don’t require to resolve dependency while working on spark shell. But it all requires if you move from spark shell to IDE. So how to create spark application in IntelliJ? In this post, we are goingRead More →

Requirement Suppose the source data is in a file. The file format is a text format. The requirement is to load the text file into hive table using Spark. In addition to this, read the data from the hive table using Spark. Therefore, let’s break the task into sub-task: LoadRead More →

Requirement In the last post, we have demonstrated how to load JSON data in Hive non-partitioned table. This time having the same sample JSON data. The requirement is to load JSON Data into Hive Partitioned table using Spark. The hive table will be partitioned by some column(s). The below taskRead More →

Requirement Suppose there is a source data which is in JSON format. The requirement is to load JSON data in Hive non-partitioned table using Spark. Let’s break the requirement into two task: Load JSON data in spark data frame and read it Store it in a hive non-partition table ComponentsRead More →