Requirement
Suppose, there is a table named EMPLOYEE in MySQL database. We want this table data in Hadoop ecosystem. So, the requirement is to import data from MySQL into Hive using Sqoop. Once data is available in Hive, we can process it.
Components Involved
In order to achieve the requirement, we are going to use following components and tools:
- MySQL – Source for the ingestion.
- Hive – Target for the ingestion.
- Sqoop – Using to import data from MySQL to Hive
Sample Data
Below is some sample record for this requirement:
id,first_name,last_name,gender,designation,city,country 1,Thayne,Mocher,Male,Administrative Officer,Port Colborne,Canada 2,Shelly,Bulfoot,Female,Analyst Programmer,Bacong,Philippines 3,Hercule,Chorlton,Male,Operator,Al Mazzūnah,Tunisia 4,Thorstein,Epton,Male,Administrative Officer,Tayirove,Ukraine 5,Madelena,Savin,Female,Help Desk Technician,Jinjiang,China 6,Adeline,Imesson,Female,Legal Assistant,Fort Beaufort,South Africa 7,Ambros,Richards,Male,Chemical Engineer,Dubiecko,Poland 8,Lilas,Harrowing,Female,Assistant Media Planner,Guayatá,Colombia 9,Freida,Leivers,Female,Legal Assistant,Bangus Kulon,Indonesia 10,Celie,Dolligon,Female,Data Coordiator,Paraty,Brazil 11,Berkley,Orteaux,Male,Assistant Professor,Zinder,Niger 12,Gilburt,Minot,Male,Marketing Assistant,Hanyuan,China 13,Blaine,Treverton,Male,Research Associate,Yuankeng,China 14,Benjamen,Dodd,Male,Assistant Professor,Beberon,Philippines 15,Nikos,Worpole,Male,Human Resources Assistant II,Esmeralda,Cuba
Solution
We are using Sqoop for ingesting data from MySQL to Hive. Here, we are using same database and table name in both places MySQL and Hive. Follow these below steps in sequence way:
Step 1: Create Table in MySQL
The first step is to have a table in MySQL. So, in this step, we will create a table named employee under the database db_bdp.
- MySQL
Use below query to create a table:
USE DATABASE db_bdp; CREATE TABLE employee ( id INT, first_name VARCHAR(100), last_name VARCHAR(100), gender VARCHAR(10), designation VARCHAR(20), city VARCHAR(20), country VARCHAR(20));
Step 2: Load data into Table
Below is the sample insert query to load data into the table:
insert into employee (id, first_name, last_name, gender, designation, city, country) values (1, 'Jervis', 'Roll', 'Male', 'Director of Sales', 'Thi Tran Lac', 'Vietnam'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (2, 'Gordon', 'Maltster', 'Male', 'Marketing Manager', 'Mabu', 'China'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (3, 'Griff', 'Godsafe', 'Male', 'Actuary', 'Kipit', 'Philippines'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (4, 'Gracie', 'Franken', 'Female', 'Assistant Manager', 'Xiabuji', 'China'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (5, 'Joelly', 'Wellbank', 'Female', 'Account Coordinator', 'Whitehorse', 'Canada'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (6, 'Bab', 'Havock', 'Female', 'Accountant II', 'Basyūn', 'Egypt'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (7, 'Carmine', 'Courage', 'Female', 'Account Coordinator', 'Boyeros', 'Cuba'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (8, 'Estella', 'Marvell', 'Female', 'Structural Analysis Engineer', 'Stettler', 'Canada'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (9, 'Celie', 'Trevaskiss', 'Female', 'Assistant Manager', 'Criuleni', 'Moldova'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (10, 'Madison', 'Ranyell', 'Male', 'Research Associate', 'Angatel', 'Philippines'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (11, 'Haydon', 'Faughey', 'Male', 'Safety Technician IV', 'Masalovka', 'Russia'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (12, 'Michele', 'Zarfai', 'Male', 'Legal Assistant', 'Karatau', 'Kazakhstan'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (13, 'Ruthi', 'Bowmer', 'Female', 'Analog Circuit Design manager', 'Peski', 'Russia'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (14, 'Adolphus', 'Pickthorne', 'Male', 'Senior Developer', 'Mae Fa Luang', 'Thailand'); insert into employee (id, first_name, last_name, gender, designation, city, country) values (15, 'Kat', 'Dymocke', 'Female', 'Geological Engineer', 'Markópoulo Oropoú', 'Greece');
Download full insert query and execute it. It will insert data into the table.
Step 3: Data in RDBMS
We have created a table and loaded the data into it. Now, let’s see how many records we have in this table.
SELECT * FROM employee LIMIT 10;
Step 4: Create Sqoop Job to ingest data
The template of a sqoop job looks like
sqoop import \
–connect jdbc:mysql://<host>:<port>/<db-name> \
–driver com.mysql.jdbc.Driver \
–username <username> \
–password <password> \
–table <table_name>
For our job requirement, we have created the sqoop job:
sqoop import \ --connect jdbc:mysql://localhost:3306/db_bdp \ --driver com.mysql.jdbc.Driver \ --username root \ --table employee \ --hive-import \ --split-by id \ --hive-table db_bdp.employee
Import – used when ingesting data from RDBMS to Hadoop
Connect – Used to connect to the specified connection string
Driver – Used to connect to mysql
Username – To get access the MySQL table
Hive-import – Used to import data into Hive table
Split-by – It has been given to perform a sequence. We do keep the primary key of the table in split-by. In case, you don’t want to mention any column for a split by then mention -m 1 instead of –split-by.
Hive-table – Hive table name in which sqoop will ingest the data
\ – Given to mention all the rows in a single line command
Here, we have not created any Hive table before ingesting data from MySQL. It fetches the metadata from the MySQL table and creates hive table itself based on metadata.
Step 5: Job Execution
Once we are done with the Sqoop job. Let’s execute it:
In the background, it calls a MapReduce job. Here, it will have only map task, no any reducer task. We can check the number of task(s) and details by accessing job tracker URL.
Step 6: Data Validation
After successful completion of ingestion job. Let’s validate data in hive table:
Count the records in MySQL and Hive table:
Wrapping Up
In this post, we have seen data ingestion from MySQL to Hive using Sqoop. Here, we have loaded all the data into target hive table. You can control on the columns. For that, you have to modify the Sqoop job to specify the columns name by using the command –columns and provide all the required columns with comma separated.