Requirement
There are two files which contain employee’s basic information. One file store employee’s details who have joined in the year of 2012 and another is for the employees who have joined in the year of 2013. Now, we want to load files into hive partitioned table which is partitioned by year of joining.
Components Involved
- Hive
- HDFS
Sample Data
Here are the given sample data of two files.
You can download these file from here.
2012 2013Solution
There are two types of partition in the hive – Static and Dynamic Partition. Here, we are going to use the Static partition.
Step 1: Data Preparation
Once you downloaded the data given above, copy it to HDFS location. We can use below command to copy the files from LOCAL to HDFS:
In my case, local path is /root/bdp/hive and HDFS path is /user/bdp/hive
hadoop fs -copyFromLocal /root/bdp/hive/*.txt /user/bdp/hive
Step 2: Create Partitioned Table
Let’s create the hive partitioned table:
CREATE TABLE db_bdpbase.Employee( id INT, firstname STRING, lastname STRING, gender STRING, designation STRING, city STRING, country STRING ) PARTITIONED BY ( year INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE TBLPROPERTIES("skip.header.line.count"="1");
Here, we have created a partitioned table which is partitioned by year of joining i.e. year. The data type of the partitioned column is INT.
Step 3: Load data into Partitioned Table
In this step, We will load the same files which are present in HDFS location. Here, we will load 2012.txt file data into year=2012 partition and 2013.txt file data into year=2013 partition.
LOAD DATA INPATH 'hdfs://sandbox.hortonworks.com:8020/user/bdp/hive/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012); LOAD DATA INPATH 'hdfs://sandbox.hortonworks.com:8020/user/bdp/hive/2013.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2013);
Let’s check how many partitions are there in the table:
Show partitions db_bdpbase.Employee;
Check the data
select * from db_bdpbase.employee;
Wrapping Up
In this post, we have used static partitioned to store the data. If you want to dig deep into static and dynamic partitioned table in hive, then read Partitioning in Hive post.