Load files into Hive Partitioned Table

Requirement

There are two files which contain employee’s basic information. One file store employee’s details who have joined in the year of 2012 and another is for the employees who have joined in the year of  2013. Now, we want to load files into hive partitioned table which is partitioned by year of joining.

Components Involved

  • Hive
  • HDFS

Sample Data

Here are the given sample data of two files.

You can download these file from here.

2012 2013

Solution

There are two types of partition in the hive – Static and Dynamic Partition. Here, we are going to use the Static partition.

Step 1: Data Preparation

Once you downloaded the data given above, copy it to HDFS location. We can use below command to copy the files from LOCAL to HDFS:

In my case, local path is /root/bdp/hive and HDFS path is /user/bdp/hive

 
 
  1. hadoop fs -copyFromLocal /root/bdp/hive/*.txt /user/bdp/hive

Step 2: Create Partitioned Table

Let’s create the hive partitioned table:

 
 
  1. CREATE TABLE db_bdpbase.Employee(
  2. id INT,
  3. firstname STRING,
  4. lastname STRING,
  5. gender STRING,
  6. designation STRING,
  7. city STRING,
  8. country STRING
  9. ) PARTITIONED BY (
  10. year INT
  11. ) ROW FORMAT DELIMITED
  12. FIELDS TERMINATED BY ','
  13. STORED AS TEXTFILE
  14. TBLPROPERTIES("skip.header.line.count"="1");

Here, we have created a partitioned table which is partitioned by year of joining i.e. year. The data type of the partitioned column is INT.

Step 3: Load data into Partitioned Table

In this step, We will load the same files which are present in HDFS location. Here, we will load 2012.txt file data into year=2012 partition and 2013.txt file data into year=2013 partition.

 
 
  1. LOAD DATA INPATH 'hdfs://sandbox.hortonworks.com:8020/user/bdp/hive/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012);
  2. LOAD DATA INPATH 'hdfs://sandbox.hortonworks.com:8020/user/bdp/hive/2013.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2013);

Let’s check how many partitions are there in the table:

 
 
  1. Show partitions db_bdpbase.Employee;

Check the data

 
 
  1. SELECT * FROM db_bdpbase.employee;

Wrapping Up

In this post, we have used static partitioned to store the data. If you want to dig deep into static and dynamic partitioned table in hive, then read Partitioning in Hive post.

Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More
/ hive, hive interview, interview-qa

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More

Leave a Reply