Load hive table into pig

Requirement

You have one table in hive, and it is needed to process the data of that hive table using pig. To load data directly from the file we generally use PigStorage(), but to load data from hive table we need different loading function. Let’s go into detail step by step.

Solution

Step 1: Load Data

Assume that we don’t have any table in the hive. So, let’s make it first. First start hive CLI, then create and load data into table “profits” which is under bdp schema.
After executing below queries, verify that data is loaded successfully.

Use the below command to create a table:

CREATE SCHEMA IF NOT EXISTS bdp;

CREATE TABLE bdp.profits (product_id INT,profit BIGINT);

Use below command to insert data into table profits:

 INSERT INTO TABLE bdp.profits VALUES
('123','1365'),('124','3253'),('125','91522'),
('123','51842'),('127','19616'),('128','2433'),
('127','182652'),('130','21632'),('122','21632'),
('127','21632'),('135','21632'),('123','21632'),('135','3282');

Verify data is loaded using below command.

  select * from bdp.profits;

Step 2: Import table into pig

As we need to process this dataset using Pig, so let’s go to grunt shell, use below command to enter into grunt shell, remember –useHCatalog is must as we need jars which are required to fetch data from a hive.

  pig -useHCatalog;

Let’s have one relation PROFITS in which we can load data from the hive table.

 PROFITS = LOAD 'bdp.profits' USING org.apache.hive.hcatalog.pig.HCatLoader();

Step 3: Output

Enter below command to see whether data is loaded or not.

 dump PROFITS

dump PROFITS will give below result.

Remember we don’t need to define schema after HCatLoader() unlike PigStorage() because it will directly fetch the schema from hive metastore.

To confirm the schema you can use below command:

 Describe PROFITS

Wrapping UP :

While working on the project, we get data from many sources, the hive can be one data source, and if we are dealing with unstructured data and structured data apache pig is best, in that case, we can use HCatloader to import hive table and process it with other datasets.

Keep solving, keep learning.

Don’t miss the tutorial on Top Big data courses on Udemy you should Buy

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply