Import data from MySQL into Hive using Sqoop

Import data from MySQL into Hive using Sqoop

Requirement

Suppose, there is a table named EMPLOYEE in MySQL database. We want this table data in Hadoop ecosystem. So, the requirement is to import data from MySQL into Hive using Sqoop. Once data is available in Hive, we can process it.

Components Involved

In order to achieve the requirement, we are going to use following components and tools:

  • MySQL – Source for the ingestion.
  • Hive – Target for the ingestion.
  • Sqoop – Using to import data from MySQL to Hive

Sample Data

Below is some sample record for this requirement:

sample-data
 
  1. id,first_name,last_name,gender,designation,city,country
  2. 1,Thayne,Mocher,Male,Administrative Officer,Port Colborne,Canada
  3. 2,Shelly,Bulfoot,Female,Analyst Programmer,Bacong,Philippines
  4. 3,Hercule,Chorlton,Male,Operator,Al Mazzūnah,Tunisia
  5. 4,Thorstein,Epton,Male,Administrative Officer,Tayirove,Ukraine
  6. 5,Madelena,Savin,Female,Help Desk Technician,Jinjiang,China
  7. 6,Adeline,Imesson,Female,Legal Assistant,Fort Beaufort,South Africa
  8. 7,Ambros,Richards,Male,Chemical Engineer,Dubiecko,Poland
  9. 8,Lilas,Harrowing,Female,Assistant Media Planner,Guayatá,Colombia
  10. 9,Freida,Leivers,Female,Legal Assistant,Bangus Kulon,Indonesia
  11. 10,Celie,Dolligon,Female,Data Coordiator,Paraty,Brazil
  12. 11,Berkley,Orteaux,Male,Assistant Professor,Zinder,Niger
  13. 12,Gilburt,Minot,Male,Marketing Assistant,Hanyuan,China
  14. 13,Blaine,Treverton,Male,Research Associate,Yuankeng,China
  15. 14,Benjamen,Dodd,Male,Assistant Professor,Beberon,Philippines
  16. 15,Nikos,Worpole,Male,Human Resources Assistant II,Esmeralda,Cuba

Solution

We are using Sqoop for ingesting data from MySQL to Hive. Here, we are using same database and table name in both places MySQL and Hive. Follow these below steps in sequence way:

Step 1: Create Table in MySQL

The first step is to have a table in MySQL. So, in this step, we will create a table named employee under the database db_bdp.

  • MySQL

Use below query to create a table:

create-table
 
USE DATABASE db_bdp;
CREATE TABLE employee (
id INT, 
first_name VARCHAR(100),
last_name VARCHAR(100),
gender VARCHAR(10),
designation VARCHAR(20),
city VARCHAR(20),
country VARCHAR(20));

Step 2: Load data into Table

Below is the sample insert query to load data into the table:

sample
 
insert into employee (id, first_name, last_name, gender, designation, city, country) values (1, 'Jervis', 'Roll', 'Male', 'Director of Sales', 'Thi Tran Lac', 'Vietnam');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (2, 'Gordon', 'Maltster', 'Male', 'Marketing Manager', 'Mabu', 'China');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (3, 'Griff', 'Godsafe', 'Male', 'Actuary', 'Kipit', 'Philippines');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (4, 'Gracie', 'Franken', 'Female', 'Assistant Manager', 'Xiabuji', 'China');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (5, 'Joelly', 'Wellbank', 'Female', 'Account Coordinator', 'Whitehorse', 'Canada');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (6, 'Bab', 'Havock', 'Female', 'Accountant II', 'Basyūn', 'Egypt');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (7, 'Carmine', 'Courage', 'Female', 'Account Coordinator', 'Boyeros', 'Cuba');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (8, 'Estella', 'Marvell', 'Female', 'Structural Analysis Engineer', 'Stettler', 'Canada');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (9, 'Celie', 'Trevaskiss', 'Female', 'Assistant Manager', 'Criuleni', 'Moldova');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (10, 'Madison', 'Ranyell', 'Male', 'Research Associate', 'Angatel', 'Philippines');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (11, 'Haydon', 'Faughey', 'Male', 'Safety Technician IV', 'Masalovka', 'Russia');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (12, 'Michele', 'Zarfai', 'Male', 'Legal Assistant', 'Karatau', 'Kazakhstan');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (13, 'Ruthi', 'Bowmer', 'Female', 'Analog Circuit Design manager', 'Peski', 'Russia');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (14, 'Adolphus', 'Pickthorne', 'Male', 'Senior Developer', 'Mae Fa Luang', 'Thailand');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (15, 'Kat', 'Dymocke', 'Female', 'Geological Engineer', 'Markópoulo Oropoú', 'Greece');

Download full insert query and execute it. It will insert data into the table.

employee

Step 3: Data in RDBMS

We have created a table and loaded the data into it. Now, let’s see how many records we have in this table.

SELECT * FROM employee LIMIT 10;

Step 4: Create Sqoop Job to ingest data

The template of a sqoop job looks like

sqoop import \

–connect jdbc:mysql://<host>:<port>/<db-name> \

–driver com.mysql.jdbc.Driver \

–username <username> \

–password <password> \

–table <table_name>

For our job requirement, we have created the sqoop job:

 
 
sqoop import \
--connect jdbc:mysql://localhost:3306/db_bdp \
--driver com.mysql.jdbc.Driver \
--username root \
--table employee \
--hive-import \
--split-by id \
--hive-table db_bdp.employee

Import –  used when ingesting data from RDBMS to Hadoop

Connect – Used to connect to the specified connection string

Driver – Used to connect to mysql

Username – To get access the MySQL table

Hive-import – Used to import data into Hive table

Split-by – It has been given to perform a sequence. We do keep the primary key of the table in split-by. In case, you don’t want to mention any column for a split by then mention -m 1 instead of –split-by.

Hive-table – Hive table name in which sqoop will ingest the data

\ – Given to mention all the rows in a single line command

Here, we have not created any Hive table before ingesting data from MySQL. It fetches the metadata from the MySQL table and creates hive table itself based on metadata.

Step 5: Job Execution

Once we are done with the Sqoop job. Let’s execute it:

In the background, it calls a MapReduce job. Here, it will have only map task, no any reducer task. We can check the number of task(s) and details by accessing job tracker URL.

Step 6: Data Validation

After successful completion of ingestion job. Let’s validate data in hive table:

Count the records in MySQL and Hive table:

Wrapping Up

In this post, we have seen data ingestion from MySQL to Hive using Sqoop. Here, we have loaded all the data into target hive table. You can control on the columns. For that, you have to modify the Sqoop job to specify the columns name by using the command –columns and provide all the required columns with comma separated.

81
0

Join in hive with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in pyspark with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in spark using scala with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Java UDF to convert String to date in PIG

About Code Many times it happens like you have received data from many systems and each system operates on a ...
Read More
/ java udf, Pig, pig, pig udf, string to date, udf

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.