Find max value of a row in Hive

Find max value of a row in Hive

Requirement

Suppose we are having some data in a hive table. The table contains information about company’s quarterly wise profit. Now, the requirement is to find max profit of each company from all quarters.

Sample Data

The record having 5 columns – company name, quarter 1 as Q1, quarter 2 as Q2, quarter 3 as Q3 and quater4 as Q4. The data looks like

NAME

Q1

Q2

Q3

Q4

C1

10.4

12.3

9.6

11.4

C2

9.5

8.7

10.6

11.3

C3

9.2

11.7

12.2

10.8

C4

12.3

10.6

11.3

9.8

Solution

Let’s first create a table and insert the sample data. After that, we will find the solution to achieve the requirement.

Step 1: Create Hive Table

Create a hive table say Company_Finance_Detail. The DDL will look like:

create-table
 
CREATE TABLE Company_Finance_Detail (
    name STRING, 
    Q1 DOUBLE, 
    Q2 DOUBLE, 
    Q3 DOUBLE, 
    Q4 DOUBLE
) STORED AS ORC;

Here, we have created hive table with ORC storage.

Step 2: Load data

Use below command to insert data into created hive table.

 
 
INSERT INTO Company_Finance_Detail VALUES ('C1', 10.4, 12.3, 9.6, 11.4),                                       ('C2', 9.5, 8.7, 10.6, 11.3),                                            ('C3', 9.2, 11.7, 12.2, 10.8),                                            ('C4', 12.3, 10.6, 11.3, 9.8);

 
 
select * from Company_Finance_Detail;

Step 3: Find MAX profit of each Company

Approach 1: Here, we cannot use the max function. Basically, the max function finds the maximum value from a column data. But we have to find the max of each row. So, here we have used below query.

 
 
SELECT name,
       (CASE
            WHEN Q1 > Q2
                 AND Q1 > Q3
                 AND Q1 > Q4 THEN Q1
            WHEN Q2 > Q3
                 AND Q2 > Q4 THEN Q2
            WHEN Q3 > Q4
                 AND Q3 > Q4 THEN Q3
            ELSE Q4
        END) AS Quarter
FROM Company_Finance_Detail;

In the above query, we are using case condition and checking every quarter’s value of each company. We are getting the correct output. But what if more column presents in the table. Then we have to append additional query in case condition for the rest columns.

Approach 2: Instead of using this, we can use a hive function called greatest. This function is available from hive version 1.1 onward.

greatest(T v1, T v2, T v3) – Takes multiple values and return max among them.

Now, let’s use the greatest function:

 
 
SELECT name, greatest(Q1, Q2, Q3, Q4) FROM Company_Finance_Detail;

Here, we can pass multiple columns of the same data type to find out max value among them.

Note: The greatest function works fine if your table is in ORC format.

What If Hive Table is not in ORC format

Suppose, we have created a hive table Company_Finance_Detail_TEMP with default storage. So, the DDL will look like:

 
 
  1. CREATE TABLE Company_Finance_Detail_TEMP (
  2. name STRING,
  3. Q1 DOUBLE,
  4. Q2 DOUBLE,
  5. Q3 DOUBLE,
  6. Q4 DOUBLE
  7. );

 
 
INSERT INTO Company_Finance_Detail_TEMP VALUES ('C1', 10.4, 12.3, 9.6, 11.4),                                            ('C2', 9.5, 8.7, 10.6, 11.3),                                            ('C3', 9.2, 11.7, 12.2, 10.8),                                            ('C4', 12.3, 10.6, 11.3, 9.8);

Now, let’s use the same query which has been used above to find max profit of each company.

 
 
SELECT name, greatest(Q1, Q2, Q3, Q4) FROM Company_Finance_Detail_TEMP;

Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating greatest(q1,q2,q3,q4)

In order to fix this exception, we have to explicitly cast all the columns in their data type. The data type of the quarter column is double. Let’s try the updated query:

 
 
SELECT name, greatest(DOUBLE(Q1), DOUBLE(Q2), DOUBLE(Q3), DOUBLE(Q4)) FROM Company_Finance_Detail_TEMP;

Greatest function work if all the columns are of same data type and only primitive data type.

Wrapping Up

In this post, we have achieved how to find max value of a row in hive table. We have used both cases and hive build-in greatest function. Also, we have seen how greatest function behaving with different storage.

46
0

Join in hive with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in pyspark with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in spark using scala with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Java UDF to convert String to date in PIG

About Code Many times it happens like you have received data from many systems and each system operates on a ...
Read More
/ java udf, Pig, pig, pig udf, string to date, udf

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.