Load hive table into pig

Requirement

You have one table in hive, and it is needed to process the data of that hive table using pig.To load data directly from file we generally use PigStorage(),but to load data from hive table we need different loading function. Let’s go into detail step by step.

Solution


Step 1: Load Data

 Assume that we don’t have any table in hive.so let’s make it first. Let’s first login to hive and load data into table “profits” which is under bdp schema.
After executing below queries, verify that data is loaded successfully.

Use below command to create a table .

 
 
  1. CREATE SCHEMA IF NOT EXISTS bdp;
  2. CREATE TABLE bdp.profits (product_id INT,profit BIGINT);

Use below command to insert data into table profits.

 
 
  1. INSERT INTO TABLE bdp.profits VALUES
  2. ('123','1365'),('124','3253'),('125','91522'),
  3. ('123','51842'),('127','19616'),('128','2433'),
  4. ('127','182652'),('130','21632'),('122','21632'),
  5. ('127','21632'),('135','21632'),('123','21632'),('135','3282');

 

Verify data is loaded using below command.

 
 
  1.  select * from bdp.profits;

Step 2: Import table into pig

As we need to process this dataset using Pig so let’s go to grunt shell, use below command to enter into grunt shell, remember –useHCatalog is must as we need jars which are required to fetch data from a hive.

 
 
  1.  pig -useHCatalog;

Let’s have one relation PROFITS in which we can load data from hive table.

 
 
  1. PROFITS = LOAD 'bdp.profits' USING org.apache.hive.hcatalog.pig.HCatLoader();

Step 3 Output

Enter below command to see whether data is loaded or not.

 
 
  1. dump PROFITS

dump PROFITS will give below result.

Remember we don’t need to define schema after HCatLoader() unlike PigStorage(),because it will directly fetch the schema from hive metastore.

To confirm the schema you can use below command.

 
 
  1. Describe PROFITS

Wrapping UP :

While working on project we get data from many sources ,hive can be one data source ,and if we are dealing with unstructured data and structured data apache pig is best ,in that case we can use HCatloader to import hive table and process it with other dataset.

Keep solving, keep learning.Subscribe us.

Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More
/ hive, hive interview, interview-qa

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More

Leave a Reply