Data migration from Hive to HBase

Requirement

Suppose we have data in Hive table. We want the same data into HBase table. So, our requirement is to migrate the data from Hive to HBase table.

Components Involved

  • Hive – Source table
  • HBase – Target Table

Solution

We cannot load data directly into HBase table from the hive. In order to achieve the requirement, we have to go through the following steps:

Step 1: Create Hive table

If you already have hive table with data then jump to step 3.

We are creating this hive table as a source. This table data, we want in HBase table.

 CREATE TABLE hive_table(
  empno int,
  ename string,
  designation string,
  manager int,
  sal int,
  deptno int)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

Step 2: Load data into Hive

Loading the data from the local path. In my case, the local path is /root/bdp/hbase/data/emp_data.csv.

 LOAD DATA LOCAL INPATH '/root/bdp/hbase/data/emp_data.csv' INTO TABLE hive_table;

Step 3: Create HBase-Hive Mapping table

In this step, we are creating another hive table which actually points to an HBase table.

 CREATE TABLE hbase_table_employee 
  ( 
     empno       INT, 
     ename       STRING, 
     designation STRING, 
     manager     INT, 
     hire_date   STRING, 
     sal         INT, 
     deptno      INT 
  )
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:ename,cf:designation,cf:manager,cf:hire_date,cf:sal,cf:deptno")
TBLPROPERTIES ("hbase.table.name" = "employee_hbase");

Here, we are specifying HBaseStorageHandler in Stored By option. Also, mapping the hive column with HBase column family using SERDEPROPERTIES. It will create an HBase table named employee_hbase which will point to this hive table.

Note: Key represents the first column of the hive table. So Id will become the key.

In case, you are already having an HBase table and want to load data into existing HBase table, then you have to use EXTERNAL in your above hive DDL. Otherwise, you will get an error:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:MetaException(message:Table employee_hbase_2 already exists within HBase; use CREATE EXTERNAL TABLE instead to register it in Hive.)

 CREATE EXTERNAL TABLE hbase_table_employee_2
  ( 
     empno       INT, 
     ename       STRING, 
     designation STRING, 
     manager     INT, 
     hire_date   STRING, 
     sal         INT, 
     deptno      INT 
  )
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:ename,cf:designation,cf:manager,cf:hire_date,cf:sal,cf:deptno")
TBLPROPERTIES ("hbase.table.name" = "employee_hbase_2");

Step 4: Load data into HBase from Hive

In this step, we are going to migrate hive table data to HBase. That means we will load the hive (created in step 1) data to hive table created in step 3.

 INSERT INTO TABLE hbase_table_employee SELECT * FROM hive_table;

We have loaded data into hbase_table_employee table which is pointing to HBase table employee_hbase.

Step 5: Scan HBase Table

Let’s check the data in HBase table:

 hbase(main):008:0> scan 'employee_hbase'
ROW                             COLUMN+CELL
 7369                           column=cf:deptno, timestamp=1514476352028, value=20
 7369                           column=cf:designation, timestamp=1514476352028, value=CLERK
 7369                           column=cf:ename, timestamp=1514476352028, value=SMITH
 7369                           column=cf:hire_date, timestamp=1514476352028, value=12/17/1980
 7369                           column=cf:manager, timestamp=1514476352028, value=7902
 7369                           column=cf:sal, timestamp=1514476352028, value=800
 7499                           column=cf:deptno, timestamp=1514476352028, value=30
 7499                           column=cf:designation, timestamp=1514476352028, value=SALESMAN
 7499                           column=cf:ename, timestamp=1514476352028, value=ALLEN
 7499                           column=cf:hire_date, timestamp=1514476352028, value=2/20/1981
 7499                           column=cf:manager, timestamp=1514476352028, value=7698
 7499                           column=cf:sal, timestamp=1514476352028, value=1600
 7521                           column=cf:deptno, timestamp=1514476352028, value=30
 7521                           column=cf:designation, timestamp=1514476352028, value=SALESMAN
 7521                           column=cf:ename, timestamp=1514476352028, value=WARD
 7521                           column=cf:hire_date, timestamp=1514476352028, value=2/22/1981
 7521                           column=cf:manager, timestamp=1514476352028, value=7698
 7521                           column=cf:sal, timestamp=1514476352028, value=1250
 7566                           column=cf:deptno, timestamp=1514476352028, value=20
 7566                           column=cf:designation, timestamp=1514476352028, value=MANAGER
 7566                           column=cf:ename, timestamp=1514476352028, value=TURNER
 7566                           column=cf:hire_date, timestamp=1514476352028, value=4/2/1981
 7566                           column=cf:manager, timestamp=1514476352028, value=7839
 7566                           column=cf:sal, timestamp=1514476352028, value=2975
 7654                           column=cf:deptno, timestamp=1514476352028, value=30
 7654                           column=cf:designation, timestamp=1514476352028, value=SALESMAN
 7654                           column=cf:ename, timestamp=1514476352028, value=MARTIN
 7654                           column=cf:hire_date, timestamp=1514476352028, value=9/28/1981
 7654                           column=cf:manager, timestamp=1514476352028, value=7698
 7654                           column=cf:sal, timestamp=1514476352028, value=1250
 7698                           column=cf:deptno, timestamp=1514476352028, value=30
 7698                           column=cf:designation, timestamp=1514476352028, value=MANAGER
 7698                           column=cf:ename, timestamp=1514476352028, value=MILLER
 7698                           column=cf:hire_date, timestamp=1514476352028, value=5/1/1981
 7698                           column=cf:manager, timestamp=1514476352028, value=7839
 7698                           column=cf:sal, timestamp=1514476352028, value=2850
 7782                           column=cf:deptno, timestamp=1514476352028, value=10
 7782                           column=cf:designation, timestamp=1514476352028, value=MANAGER
 7782                           column=cf:ename, timestamp=1514476352028, value=CLARK
 7782                           column=cf:hire_date, timestamp=1514476352028, value=6/9/1981
 7782                           column=cf:manager, timestamp=1514476352028, value=7839
 7782                           column=cf:sal, timestamp=1514476352028, value=2450
 7788                           column=cf:deptno, timestamp=1514476352028, value=20
 7788                           column=cf:designation, timestamp=1514476352028, value=ANALYST
 7788                           column=cf:ename, timestamp=1514476352028, value=SCOTT
 7788                           column=cf:hire_date, timestamp=1514476352028, value=12/9/1982
 7788                           column=cf:manager, timestamp=1514476352028, value=7566
 7788                           column=cf:sal, timestamp=1514476352028, value=3000
 7839                           column=cf:deptno, timestamp=1514476352028, value=10
 7839                           column=cf:designation, timestamp=1514476352028, value=PRESIDENT
 7839                           column=cf:ename, timestamp=1514476352028, value=KING
 7839                           column=cf:hire_date, timestamp=1514476352028, value=11/17/1981
 7839                           column=cf:sal, timestamp=1514476352028, value=5000
 7844                           column=cf:deptno, timestamp=1514476352028, value=30
 7844                           column=cf:designation, timestamp=1514476352028, value=SALESMAN
 7844                           column=cf:ename, timestamp=1514476352028, value=TURNER
 7844                           column=cf:hire_date, timestamp=1514476352028, value=9/8/1981
 7844                           column=cf:manager, timestamp=1514476352028, value=7698
 7844                           column=cf:sal, timestamp=1514476352028, value=1500
 7876                           column=cf:deptno, timestamp=1514476352028, value=20
 7876                           column=cf:designation, timestamp=1514476352028, value=CLERK
 7876                           column=cf:ename, timestamp=1514476352028, value=ADAMS
 7876                           column=cf:hire_date, timestamp=1514476352028, value=1/12/1983
 7876                           column=cf:manager, timestamp=1514476352028, value=7788
 7876                           column=cf:sal, timestamp=1514476352028, value=1100
 7900                           column=cf:deptno, timestamp=1514476352028, value=30
 7900                           column=cf:designation, timestamp=1514476352028, value=CLERK
 7900                           column=cf:ename, timestamp=1514476352028, value=JAMES
 7900                           column=cf:hire_date, timestamp=1514476352028, value=12/3/1981
 7900                           column=cf:manager, timestamp=1514476352028, value=7698
 7900                           column=cf:sal, timestamp=1514476352028, value=950
 7902                           column=cf:deptno, timestamp=1514476352028, value=20
 7902                           column=cf:designation, timestamp=1514476352028, value=ANALYST
 7902                           column=cf:ename, timestamp=1514476352028, value=FORD
 7902                           column=cf:hire_date, timestamp=1514476352028, value=12/3/1981
 7902                           column=cf:manager, timestamp=1514476352028, value=7566
 7902                           column=cf:sal, timestamp=1514476352028, value=3000
 7934                           column=cf:deptno, timestamp=1514476352028, value=10
 7934                           column=cf:designation, timestamp=1514476352028, value=CLERK
 7934                           column=cf:ename, timestamp=1514476352028, value=MILLER
 7934                           column=cf:hire_date, timestamp=1514476352028, value=1/23/1982
 7934                           column=cf:manager, timestamp=1514476352028, value=7782
 7934                           column=cf:sal, timestamp=1514476352028, value=1300
14 row(s) in 1.7580 seconds

Wrapping Up

In this post, we have created a hive to hbase mapping table in order to migrate data from hive to hbase. There is an HBase table on top of our Hive table. If your hive table contains a record which has NULL values for all the columns, in that case, hive and hbase records count would differ. The reason is, HBase table will ignore that record.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply