Load files into Hive Partitioned Table

Load files into Hive Partitioned Table

Requirement

There are two files which contain employee’s basic information. One file store employee’s details who have joined in the year of 2012 and another is for the employees who have joined in the year of  2013. Now, we want to load files into hive partitioned table which is partitioned by year of joining.

Components Involved

  • Hive
  • HDFS

Sample Data

Here are the given sample data of two files.

You can download these file from here.

2012 2013

Solution

There are two types of partition in the hive – Static and Dynamic Partition. Here, we are going to use the Static partition.

Step 1: Data Preparation

Once you downloaded the data given above, copy it to HDFS location. We can use below command to copy the files from LOCAL to HDFS:

In my case, local path is /root/bdp/hive and HDFS path is /user/bdp/hive

 
 
  1. hadoop fs -copyFromLocal /root/bdp/hive/*.txt /user/bdp/hive

Step 2: Create Partitioned Table

Let’s create the hive partitioned table:

 
 
  1. CREATE TABLE db_bdpbase.Employee(
  2. id INT,
  3. firstname STRING,
  4. lastname STRING,
  5. gender STRING,
  6. designation STRING,
  7. city STRING,
  8. country STRING
  9. ) PARTITIONED BY (
  10. year INT
  11. ) ROW FORMAT DELIMITED
  12. FIELDS TERMINATED BY ','
  13. STORED AS TEXTFILE
  14. TBLPROPERTIES("skip.header.line.count"="1");

Here, we have created a partitioned table which is partitioned by year of joining i.e. year. The data type of the partitioned column is INT.

Step 3: Load data into Partitioned Table

In this step, We will load the same files which are present in HDFS location. Here, we will load 2012.txt file data into year=2012 partition and 2013.txt file data into year=2013 partition.

 
 
  1. LOAD DATA INPATH 'hdfs://sandbox.hortonworks.com:8020/user/bdp/hive/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012);
  2. LOAD DATA INPATH 'hdfs://sandbox.hortonworks.com:8020/user/bdp/hive/2013.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2013);

Let’s check how many partitions are there in the table:

 
 
  1. Show partitions db_bdpbase.Employee;

Check the data

 
 
select * from db_bdpbase.employee;

Wrapping Up

In this post, we have used static partitioned to store the data. If you want to dig deep into static and dynamic partitioned table in hive, then read Partitioning in Hive post.

0
0

Join in hive with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in pyspark with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in spark using scala with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Java UDF to convert String to date in PIG

About Code Many times it happens like you have received data from many systems and each system operates on a ...
Read More
/ java udf, Pig, pig, pig udf, string to date, udf

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.