Import data from MySQL into Hive using Sqoop

Requirement

Suppose, there is a table named EMPLOYEE in MySQL database. We want this table data in Hadoop ecosystem. So, the requirement is to import data from MySQL into Hive using Sqoop. Once data is available in Hive, we can process it.

Components Involved

In order to achieve the requirement, we are going to use following components and tools:

  • MySQL – Source for the ingestion.
  • Hive – Target for the ingestion.
  • Sqoop – Using to import data from MySQL to Hive

Sample Data

Below is some sample record for this requirement:

 id,first_name,last_name,gender,designation,city,country
1,Thayne,Mocher,Male,Administrative Officer,Port Colborne,Canada
2,Shelly,Bulfoot,Female,Analyst Programmer,Bacong,Philippines
3,Hercule,Chorlton,Male,Operator,Al Mazzūnah,Tunisia
4,Thorstein,Epton,Male,Administrative Officer,Tayirove,Ukraine
5,Madelena,Savin,Female,Help Desk Technician,Jinjiang,China
6,Adeline,Imesson,Female,Legal Assistant,Fort Beaufort,South Africa
7,Ambros,Richards,Male,Chemical Engineer,Dubiecko,Poland
8,Lilas,Harrowing,Female,Assistant Media Planner,Guayatá,Colombia
9,Freida,Leivers,Female,Legal Assistant,Bangus Kulon,Indonesia
10,Celie,Dolligon,Female,Data Coordiator,Paraty,Brazil
11,Berkley,Orteaux,Male,Assistant Professor,Zinder,Niger
12,Gilburt,Minot,Male,Marketing Assistant,Hanyuan,China
13,Blaine,Treverton,Male,Research Associate,Yuankeng,China
14,Benjamen,Dodd,Male,Assistant Professor,Beberon,Philippines
15,Nikos,Worpole,Male,Human Resources Assistant II,Esmeralda,Cuba

Solution

We are using Sqoop for ingesting data from MySQL to Hive. Here, we are using same database and table name in both places MySQL and Hive. Follow these below steps in sequence way:

Step 1: Create Table in MySQL

The first step is to have a table in MySQL. So, in this step, we will create a table named employee under the database db_bdp.

  • MySQL

Use below query to create a table:

USE DATABASE db_bdp;

CREATE TABLE employee (
id INT, 
first_name VARCHAR(100),
last_name VARCHAR(100),
gender VARCHAR(10),
designation VARCHAR(20),
city VARCHAR(20),
country VARCHAR(20));

Step 2: Load data into Table

Below is the sample insert query to load data into the table:

insert into employee (id, first_name, last_name, gender, designation, city, country) values (1, 'Jervis', 'Roll', 'Male', 'Director of Sales', 'Thi Tran Lac', 'Vietnam');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (2, 'Gordon', 'Maltster', 'Male', 'Marketing Manager', 'Mabu', 'China');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (3, 'Griff', 'Godsafe', 'Male', 'Actuary', 'Kipit', 'Philippines');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (4, 'Gracie', 'Franken', 'Female', 'Assistant Manager', 'Xiabuji', 'China');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (5, 'Joelly', 'Wellbank', 'Female', 'Account Coordinator', 'Whitehorse', 'Canada');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (6, 'Bab', 'Havock', 'Female', 'Accountant II', 'Basyūn', 'Egypt');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (7, 'Carmine', 'Courage', 'Female', 'Account Coordinator', 'Boyeros', 'Cuba');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (8, 'Estella', 'Marvell', 'Female', 'Structural Analysis Engineer', 'Stettler', 'Canada');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (9, 'Celie', 'Trevaskiss', 'Female', 'Assistant Manager', 'Criuleni', 'Moldova');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (10, 'Madison', 'Ranyell', 'Male', 'Research Associate', 'Angatel', 'Philippines');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (11, 'Haydon', 'Faughey', 'Male', 'Safety Technician IV', 'Masalovka', 'Russia');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (12, 'Michele', 'Zarfai', 'Male', 'Legal Assistant', 'Karatau', 'Kazakhstan');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (13, 'Ruthi', 'Bowmer', 'Female', 'Analog Circuit Design manager', 'Peski', 'Russia');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (14, 'Adolphus', 'Pickthorne', 'Male', 'Senior Developer', 'Mae Fa Luang', 'Thailand');
insert into employee (id, first_name, last_name, gender, designation, city, country) values (15, 'Kat', 'Dymocke', 'Female', 'Geological Engineer', 'Markópoulo Oropoú', 'Greece');

Download full insert query and execute it. It will insert data into the table.

employee

Step 3: Data in RDBMS

We have created a table and loaded the data into it. Now, let’s see how many records we have in this table.

SELECT * FROM employee LIMIT 10;

Step 4: Create Sqoop Job to ingest data

The template of a sqoop job looks like

sqoop import \

–connect jdbc:mysql://<host>:<port>/<db-name> \

–driver com.mysql.jdbc.Driver \

–username <username> \

–password <password> \

–table <table_name>

For our job requirement, we have created the sqoop job:

 sqoop import \
--connect jdbc:mysql://localhost:3306/db_bdp \
--driver com.mysql.jdbc.Driver \
--username root \
--table employee \
--hive-import \
--split-by id \
--hive-table db_bdp.employee

Import –  used when ingesting data from RDBMS to Hadoop

Connect – Used to connect to the specified connection string

Driver – Used to connect to mysql

Username – To get access the MySQL table

Hive-import – Used to import data into Hive table

Split-by – It has been given to perform a sequence. We do keep the primary key of the table in split-by. In case, you don’t want to mention any column for a split by then mention -m 1 instead of –split-by.

Hive-table – Hive table name in which sqoop will ingest the data

\ – Given to mention all the rows in a single line command

Here, we have not created any Hive table before ingesting data from MySQL. It fetches the metadata from the MySQL table and creates hive table itself based on metadata.

Step 5: Job Execution

Once we are done with the Sqoop job. Let’s execute it:

In the background, it calls a MapReduce job. Here, it will have only map task, no any reducer task. We can check the number of task(s) and details by accessing job tracker URL.

Step 6: Data Validation

After successful completion of ingestion job. Let’s validate data in hive table:

Count the records in MySQL and Hive table:

Wrapping Up

In this post, we have seen data ingestion from MySQL to Hive using Sqoop. Here, we have loaded all the data into target hive table. You can control on the columns. For that, you have to modify the Sqoop job to specify the columns name by using the command –columns and provide all the required columns with comma separated.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply