Import data from MySQL into Hive using Sqoop

Requirement

Suppose, there is a table named EMPLOYEE in MySQL database. We want this table data in Hadoop ecosystem. So, the requirement is to import data from MySQL into Hive using Sqoop. Once data is available in Hive, we can process it.

Components Involved

In order to achieve the requirement, we are going to use following components and tools:

  • MySQL – Source for the ingestion.
  • Hive – Target for the ingestion.
  • Sqoop – Using to import data from MySQL to Hive

Sample Data

Below is some sample record for this requirement:

sample-data
 
  1. id,first_name,last_name,gender,designation,city,country
  2. 1,Thayne,Mocher,Male,Administrative Officer,Port Colborne,Canada
  3. 2,Shelly,Bulfoot,Female,Analyst Programmer,Bacong,Philippines
  4. 3,Hercule,Chorlton,Male,Operator,Al Mazzūnah,Tunisia
  5. 4,Thorstein,Epton,Male,Administrative Officer,Tayirove,Ukraine
  6. 5,Madelena,Savin,Female,Help Desk Technician,Jinjiang,China
  7. 6,Adeline,Imesson,Female,Legal Assistant,Fort Beaufort,South Africa
  8. 7,Ambros,Richards,Male,Chemical Engineer,Dubiecko,Poland
  9. 8,Lilas,Harrowing,Female,Assistant Media Planner,Guayatá,Colombia
  10. 9,Freida,Leivers,Female,Legal Assistant,Bangus Kulon,Indonesia
  11. 10,Celie,Dolligon,Female,Data Coordiator,Paraty,Brazil
  12. 11,Berkley,Orteaux,Male,Assistant Professor,Zinder,Niger
  13. 12,Gilburt,Minot,Male,Marketing Assistant,Hanyuan,China
  14. 13,Blaine,Treverton,Male,Research Associate,Yuankeng,China
  15. 14,Benjamen,Dodd,Male,Assistant Professor,Beberon,Philippines
  16. 15,Nikos,Worpole,Male,Human Resources Assistant II,Esmeralda,Cuba

Solution

We are using Sqoop for ingesting data from MySQL to Hive. Here, we are using same database and table name in both places MySQL and Hive. Follow these below steps in sequence way:

Step 1: Create Table in MySQL

The first step is to have a table in MySQL. So, in this step, we will create a table named employee under the database db_bdp.

  • MySQL

Use below query to create a table:

create-table
 
  1. USE DATABASE db_bdp;
  2. CREATE TABLE employee (
  3. id INT,
  4. first_name VARCHAR(100),
  5. last_name VARCHAR(100),
  6. gender VARCHAR(10),
  7. designation VARCHAR(20),
  8. city VARCHAR(20),
  9. country VARCHAR(20));

Step 2: Load data into Table

Below is the sample insert query to load data into the table:

sample
 
  1. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (1, 'Jervis', 'Roll', 'Male', 'Director of Sales', 'Thi Tran Lac', 'Vietnam');
  2. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (2, 'Gordon', 'Maltster', 'Male', 'Marketing Manager', 'Mabu', 'China');
  3. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (3, 'Griff', 'Godsafe', 'Male', 'Actuary', 'Kipit', 'Philippines');
  4. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (4, 'Gracie', 'Franken', 'Female', 'Assistant Manager', 'Xiabuji', 'China');
  5. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (5, 'Joelly', 'Wellbank', 'Female', 'Account Coordinator', 'Whitehorse', 'Canada');
  6. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (6, 'Bab', 'Havock', 'Female', 'Accountant II', 'Basyūn', 'Egypt');
  7. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (7, 'Carmine', 'Courage', 'Female', 'Account Coordinator', 'Boyeros', 'Cuba');
  8. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (8, 'Estella', 'Marvell', 'Female', 'Structural Analysis Engineer', 'Stettler', 'Canada');
  9. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (9, 'Celie', 'Trevaskiss', 'Female', 'Assistant Manager', 'Criuleni', 'Moldova');
  10. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (10, 'Madison', 'Ranyell', 'Male', 'Research Associate', 'Angatel', 'Philippines');
  11. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (11, 'Haydon', 'Faughey', 'Male', 'Safety Technician IV', 'Masalovka', 'Russia');
  12. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (12, 'Michele', 'Zarfai', 'Male', 'Legal Assistant', 'Karatau', 'Kazakhstan');
  13. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (13, 'Ruthi', 'Bowmer', 'Female', 'Analog Circuit Design manager', 'Peski', 'Russia');
  14. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (14, 'Adolphus', 'Pickthorne', 'Male', 'Senior Developer', 'Mae Fa Luang', 'Thailand');
  15. INSERT INTO employee (id, first_name, last_name, gender, designation, city, country) VALUES (15, 'Kat', 'Dymocke', 'Female', 'Geological Engineer', 'Markópoulo Oropoú', 'Greece');

Download full insert query and execute it. It will insert data into the table.

employee

Step 3: Data in RDBMS

We have created a table and loaded the data into it. Now, let’s see how many records we have in this table.

SELECT * FROM employee LIMIT 10;

Step 4: Create Sqoop Job to ingest data

The template of a sqoop job looks like

sqoop import \

–connect jdbc:mysql://<host>:<port>/<db-name> \

–driver com.mysql.jdbc.Driver \

–username <username> \

–password <password> \

–table <table_name>

For our job requirement, we have created the sqoop job:

 
 
  1. sqoop import \
  2. --connect jdbc:mysql://localhost:3306/db_bdp \
  3. --driver com.mysql.jdbc.Driver \
  4. --username root \
  5. --table employee \
  6. --hive-import \
  7. --split-by id \
  8. --hive-table db_bdp.employee

Import –  used when ingesting data from RDBMS to Hadoop

Connect – Used to connect to the specified connection string

Driver – Used to connect to mysql

Username – To get access the MySQL table

Hive-import – Used to import data into Hive table

Split-by – It has been given to perform a sequence. We do keep the primary key of the table in split-by. In case, you don’t want to mention any column for a split by then mention -m 1 instead of –split-by.

Hive-table – Hive table name in which sqoop will ingest the data

\ – Given to mention all the rows in a single line command

Here, we have not created any Hive table before ingesting data from MySQL. It fetches the metadata from the MySQL table and creates hive table itself based on metadata.

Step 5: Job Execution

Once we are done with the Sqoop job. Let’s execute it:

In the background, it calls a MapReduce job. Here, it will have only map task, no any reducer task. We can check the number of task(s) and details by accessing job tracker URL.

Step 6: Data Validation

After successful completion of ingestion job. Let’s validate data in hive table:

Count the records in MySQL and Hive table:

Wrapping Up

In this post, we have seen data ingestion from MySQL to Hive using Sqoop. Here, we have loaded all the data into target hive table. You can control on the columns. For that, you have to modify the Sqoop job to specify the columns name by using the command –columns and provide all the required columns with comma separated.

Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More
/ hive, hive interview, interview-qa

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More

Leave a Reply