Load hive table into pig

Load hive table into pig

Requirement

You have one table in hive, and it is needed to process the data of that hive table using pig.To load data directly from file we generally use PigStorage(),but to load data from hive table we need different loading function. Let’s go into detail step by step.

Solution


Step 1: Load Data

 Assume that we don’t have any table in hive.so let’s make it first. Let’s first login to hive and load data into table “profits” which is under bdp schema.
After executing below queries, verify that data is loaded successfully.

Use below command to create a table .

 
 
  1. CREATE SCHEMA IF NOT EXISTS bdp;
  2. CREATE TABLE bdp.profits (product_id INT,profit BIGINT);

Use below command to insert data into table profits.

 
 
  1. INSERT INTO TABLE bdp.profits VALUES
  2. ('123','1365'),('124','3253'),('125','91522'),
  3. ('123','51842'),('127','19616'),('128','2433'),
  4. ('127','182652'),('130','21632'),('122','21632'),
  5. ('127','21632'),('135','21632'),('123','21632'),('135','3282');

 

Verify data is loaded using below command.

 
 
  1.  select * from bdp.profits;

Step 2: Import table into pig

As we need to process this dataset using Pig so let’s go to grunt shell, use below command to enter into grunt shell, remember –useHCatalog is must as we need jars which are required to fetch data from a hive.

 
 
  1.  pig -useHCatalog;

Let’s have one relation PROFITS in which we can load data from hive table.

 
 
  1. PROFITS = LOAD 'bdp.profits' USING org.apache.hive.hcatalog.pig.HCatLoader();

Step 3 Output

Enter below command to see whether data is loaded or not.

 
 
  1. dump PROFITS

dump PROFITS will give below result.

Remember we don’t need to define schema after HCatLoader() unlike PigStorage(),because it will directly fetch the schema from hive metastore.

To confirm the schema you can use below command.

 
 
  1. Describe PROFITS

Wrapping UP :

While working on project we get data from many sources ,hive can be one data source ,and if we are dealing with unstructured data and structured data apache pig is best ,in that case we can use HCatloader to import hive table and process it with other dataset.

Keep solving, keep learning.Subscribe us.

13
0

Join in hive with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in pyspark with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in spark using scala with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Java UDF to convert String to date in PIG

About Code Many times it happens like you have received data from many systems and each system operates on a ...
Read More
/ java udf, Pig, pig, pig udf, string to date, udf

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.