Load CSV file into hive ORC table


You have comma separated file and you want to create an ORC formatted table in hive on top of it, then please follow below mentioned steps.


Step 1: Sample CSV File

Create a sample CSV file named as sample_1.csv file.

download from here sample_1

(You can skip this step if you already have a CSV file, just place it into local directory.)

Put content in that file, delimited by a comma (,). If you have created a file in windows then transfer it to your Linux machine via WinSCP.

Please refer below screenshot.


I have a local directory named as input_files, so I have placed a sample_1.CSV file in that directory. You can see content of that file using below command

  1. cat /root/bigdataprogrammers/input_files/sample_1.csv

It will show the file content:-

Step 2:  Copy CSV to HDFS

Run Below commands in the shell for initial setup.

First, create a Hdfs directory ld_csv_hv and ip directory inside that using below command.

  1. hadoop fs -mkdir bdp/ld_csv_hv
  2. hadoop fs -mkdir bdp/ld_csv_hv/ip

Put the file in Hdfs using below command

  1. hadoop fs -put /root/bigdataprogrammers/input_files/sample_1.csv bdp/ld_csv_hv/ip/

Check whether file is available in Hdfs or not using below command

  1. hadoop fs -ls bdp/ld_csv_hv/ip/

NOTE: – For me, the default Hdfs directory is /user/root/

Step 3: Create temporary Hive Table and Load data

Now you have file in Hdfs, you just need to create an external table on top of it.Note that this is just a temporary table.

Use below hive scripts to create an external table csv_table in schema bdp. Run below script in hive CLI.

  3. (id STRING,Code STRING)
  7. LOCATION 'hdfs://sandbox.hortonworks.com:8020/user/root/bdp/ld_csv_hv/ip';

Step 4: Verify data

Please check whether CSV data is showing in a table or not using below command.

  1. select * from bdp.hv_csv_table;

Step 5: Create an ORC table

We have created the temporary table.Now it’s time to create a hive table which has ORC format.the main advantage of an ORC format is to reduce the size of a table.
Below is the code of creation of ORC table hv_orc in a hive.

  1. CREATE TABLE bdp.hv_orc
  2. (
  3. id STRING,
  4. code STRING
  5. )

Note that we have mentioned ORC in create a table.

Step 6: Copy data from a temporary table.

As we have already loaded temporary table hv_csv_table, it’s time to load the data from it to actual ORC table hv_orc.
Use below code to copy the data.

  1. INSERT INTO TABLE bdp.hv_orc SELECT * FROM bdp.hv_csv_table;

As we have created a temporary table on top of external location ‘hdfs://sandbox.hortonworks.com:8020/user/root/bdp/ld_csv_hv/ip’

As our target is accomplished, so we need to remove CSV data which is present at above location.Use below command to delete the temporary data .

  1. hadoop fs –rm –r –f hdfs://sandbox.hortonworks.com:8020/user/root/bdp/ld_csv_hv/ip/*

The reason why we are removing this data is because we do not want actual data to take so much space in hdfs location, and for that reason only we have created an ORC table. Now we have data in ORC table only, so actually, we have decreased the file size and stored in hdfs which definitely helps to reduce cost.

Step 6: Output

To see the data in hive table go to hive prompt and paste below code

  1. select * from bdp.hv_orc;

Please find below screenshot for reference.

Wrapping Up

ORC is the most used file format when it comes to minimizing the data storage cost. It stores data in ORC i.e Optimized Row Columnar format.

We are here to help you, don’t forget to subscribe us. Keep learning.

Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More
/ hive, hive interview, interview-qa

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More


  1. What if file is present with .Orc extension. Still need to load to temporary table at first?

  2. @laura In that case i think you can create a single table mentioning Orc as format.

Leave a Reply