Calculate percentage using pig


You have marks of all the students of a class with roll number in CSV file, It is needed to calculate the percentage of each student using pig.


Download the sample CSV file marks which have 7 columns, 1st column is Roll no and other 6 columns are subject1 subject2….subject6.


Step 1: Loading the sample CSV file marks.csv into HDFS.

I have Local directory named as “calculate-percentage-using-pig” in path “/root/local_bdp/problems/”
So I have kept marks.csv file in that path.

You can see sample data in below screen shot:-

Let’s create HDFS directory using below command

  1. hadoop fs -mkdir -p hdfs://

As you can see “ip” directory is created for input files.
Now we can load file into HDFS using below command

  1. hadoop fs -put /root/local_bdp/problems/calculate-percentage-using-pig/marks.csv hdfs://


Step 2

Now it’s time to interact with grunt  shell.
Enter the below command

  1.  pig

It will take you to grunt shell
Create one relation named as ip_marks which will have all the data from file marks.csv, as we know that we have kept the file in HDFS location, so mention this HDFS location in and  7 columns names with datatype in below command. .

  1. ip_marks = LOAD 'hdfs://' USING PigStorage(',') AS (roll_no:Int,subject1:Int,subject2:Int,subject3:Int,subject4:Int,subject5:Int,subject6:Int)

To see the data of this Relation  use below command.

  1. dump ip_marks

Refer below screen shot.

Step 3 : Calculation of Percentage

Use below command to calculate Percentage.

  1. per_mrks= FOREACH ip_marks generate roll_no,(float)(subject1+subject2+subject3+subject4+subject5+subject6)/6.0 AS percentage;

In above command we are adding marks of all subject and converting the sum in float and  then dividing it by 6,we have assumed that total marks of each subject are 100.You can change the formula if you wish.

Step 4 :Output

Use below command to see the data of per_mrks

  1. dump  per_mrks

To save the data into file use below command.

  1. STORE per_mrks INTO 'hdfs://' using PigStorage(',');

It will create a “op” directory, it should not already exist. You can see the list of output files using below command:-

  1. hadoop fs -ls hdfs://

Please refer below screen shot.

Don’t forget to subscribe us. Keep learning. Keep sharing.

Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More
/ hive, hive interview, interview-qa

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More

Leave a Reply