Load CSV file in hive

Requirement

If you have comma separated file and you want to create a table in the hive on top of it (need to load CSV file in hive), then follow the below steps.

Solution

Step 1: Sample CSV File

Create a sample CSV file named as sample_1.csv.

You can download the sample file from here sample_1

(You can skip this step if you already have a CSV file, just place it into a local directory.)

If you have any sample data with you, then put the content in that file with delimiter comma (,). If you have created a file in windows, then transfer it to your Linux machine via WinSCP.

please refer below screenshot.

I have a local directory named as input_files, so I have placed a sample_1.CSV file in that directory. You can see the content of that file using below command:

 cat /root/bigdataprogrammers/input_files/sample_1.csv

It will show the content of the file:-

Step 2: Copy CSV to HDFS

Run the below commands in the shell for initial setup.

First, create a Hdfs directory named as ld_csv_hv and ip using below command.

 hadoop fs -mkdir bdp/ld_csv_hv
 hadoop fs -mkdir bdp/ld_csv_hv/ip

Put the file in the created Hdfs directory using below command:

 hadoop fs -put /root/bigdataprogrammers/input_files/sample_1.csv bdp/ld_csv_hv/ip/

Check whether the file is available in Hdfs or not using below command:

 hadoop fs -ls bdp/ld_csv_hv/ip/

NOTE: – For me, the default Hdfs directory is /user/root/.

Step 3: Create Hive Table and Load data

Now, you have the file in Hdfs, you just need to create an external table on top of it.

Use below hive scripts to create an external table named as csv_table in schema bdp. Run below script in hive CLI.

  CREATE SCHEMA IF NOT EXISTS bdp;

CREATE EXTERNAL TABLE IF NOT EXISTS bdp.hv_csv_table
(id STRING,Code STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'hdfs://sandbox.hortonworks.com:8020/user/root/bdp/ld_csv_hv/ip';

Step 4: Verify data

Please check whether CSV data is showing in the table or not using below command:

 select * from bdp.hv_csv_table;

Wrapping Up

CSV is the most used file format. It stores data as comma-separated values that’s why we have used a ‘,’ delimiter in “fields terminated By” option while the creation of hive table.

Don’t miss the tutorial on Top Big data courses on Udemy you should Buy

Read Interview QA on Big Data