Load files into Hive Partitioned Table

Requirement

There are two files which contain employee’s basic information. One file store employee’s details who have joined in the year of 2012 and another is for the employees who have joined in the year of  2013. Now, we want to load files into hive partitioned table which is partitioned by year of joining.

Components Involved

  • Hive
  • HDFS

Sample Data

Here are the given sample data of two files.

You can download these file from here.

2012 2013

Solution

There are two types of partition in the hive – Static and Dynamic Partition. Here, we are going to use the Static partition.

Step 1: Data Preparation

Once you downloaded the data given above, copy it to HDFS location. We can use below command to copy the files from LOCAL to HDFS:

In my case, local path is /root/bdp/hive and HDFS path is /user/bdp/hive

 hadoop fs -copyFromLocal /root/bdp/hive/*.txt /user/bdp/hive

Step 2: Create Partitioned Table

Let’s create the hive partitioned table:

 CREATE TABLE db_bdpbase.Employee(
    id INT,
    firstname STRING,
    lastname STRING,
    gender STRING,
    designation STRING,
    city STRING,
    country STRING
) PARTITIONED BY (
    year INT
) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
TBLPROPERTIES("skip.header.line.count"="1");

Here, we have created a partitioned table which is partitioned by year of joining i.e. year. The data type of the partitioned column is INT.

Step 3: Load data into Partitioned Table

In this step, We will load the same files which are present in HDFS location. Here, we will load 2012.txt file data into year=2012 partition and 2013.txt file data into year=2013 partition.

 LOAD DATA INPATH 'hdfs://sandbox.hortonworks.com:8020/user/bdp/hive/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012);
LOAD DATA INPATH 'hdfs://sandbox.hortonworks.com:8020/user/bdp/hive/2013.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2013);

Let’s check how many partitions are there in the table:

 Show partitions db_bdpbase.Employee;

Check the data

 select * from db_bdpbase.employee;

Wrapping Up

In this post, we have used static partitioned to store the data. If you want to dig deep into static and dynamic partitioned table in hive, then read Partitioning in Hive post.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply