Calculate percentage in hive

Requirement

You have marks of all the students of a class with roll number in CSV file, It is needed to calculate the percentage of each student in hive.

Given:

Download the sample CSV file marks which have 7 columns, 1st column is Roll no and other 6 columns are subject1 subject2….subject6.

Solution

Step 1: Loading the sample CSV file marks.csv into Hive Table.

I have Local directory named as “calculate-percentage-in-hive” in path “/root/local_bdp/problems/”
So I have kept marks.csv file in that path.

You can see sample data in below screen shot:-

Let’s create HDFS directory using below command

 
 
  1. hadoop fs -mkdir -p hdfs://sandbox.hortonworks.com:8020/user/root/bdp/problems/cal_per_hive/ip/

As you can see “ip” directory is created for input files.
Now we can load file into HDFS using below command

 
 
  1. hadoop fs -put /root/local_bdp/problems/calculate-percentage-in-hive/marks.csv hdfs://sandbox.hortonworks.com:8020/user/root/bdp/problems/cal_per_hive/ip/

If you have trouble understanding loading of csv file in to hive you can refer here.Use below commands in hive to create an external table.

 
 
  1. CREATE SCHEMA IF NOT EXISTS bdp;
  2. CREATE EXTERNAL TABLE IF NOT EXISTS bdp.hv_per
  3. (roll_no INT,subject1 INT,subject2 INT,subject3 INT,subject4 INT,subject5 INT,subject6 INT)
  4. ROW FORMAT DELIMITED
  5. FIELDS TERMINATED BY ','
  6. STORED AS TEXTFILE
  7. LOCATION 'hdfs://sandbox.hortonworks.com:8020/user/root/bdp/problems/cal_per_hive/ip/';

You can skip above steps if data is already available in hive table.

Step 2 : Hive

Now it’s time to interact with hive CLI.

Enter the below command:-

 
 
  1. hive

It will take you to hive CLI.As we have created a hv_per table in bdp schema.Which have all 7 columns.Use below command to see whether data is loaded or not.

 
 
  1. Select * FROM bdp.hv_per;

Step 3: Calculation of Percentage

Use below query to calculate percentage .

 
 
  1. SELECT roll_no,CAST(subject1+subject2+subject3+subject4+subject5+subject6 AS float)/6.0 AS percentage
  2. FROM bdp.hv_per;

As you can see first we are summing all subjects and then casting the result of the sum in to float, after that we are dividing it by 6.0, As  we have assumed that total marks of each subject are 100.you can change the formula if you wish.

Refer below screen shot. Which is the output of the query.

Don’t forget to subscribe us. Keep learning. Keep sharing.

If you want to learn how to calculate percentage in spark using scala,click here.

 

Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More
/ hive, hive interview, interview-qa

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More

Leave a Reply