Load csv file in pig


Assume that you want to load CSV file in pig and output of pig should be pipe delimited and should be stored in one directory.


Please follow the below steps:-

Step 1: Sample CSV file

Create a sample CSV file named as sample_1.csv file.

Put content in that file, delimited by a comma (,). If you have created a file in windows then transfer it to your Linux machine via WinSCP.

you can get the file from here sample_1

I have a local directory as /root/bigdataprogrammers/input_files, so I have placed a sample_1.csv file in that directory.

You can see content of that file using below command in shell

  1. cat /root/bigdataprogrammers/input_files/sample_1.csv

It will show the file content:-

Step 2: Load Data in Pig

Now I will load the file in pig, because the file is present in local, so here I prefer pig in local mode. In real time pig MapReduce mode works.

Enter a below-mentioned command in putty

  1. pig -x local

It will take you to the grunt shell.Type the below-mentioned command in grunt shell.

  1. MyCSVData = LOAD '/root/bigdataprogrammers/input_files/sample_1.csv' using PigStorage(',') AS (id:chararray,code:chararray);

You can see whether the file is loaded or not using below command:-

  1. dump MyCSVData

It will show the content of relation MyCSVData.

Step 3: Store output  data in Pig

You can store this relation using STORE command in pig, assume that you want output file into pipe delimited format.

Enter below command to store the relation into pipe delimited file. It will be stored in a csv_to_pipe directory which will be created dynamically, make sure that you do not have the same directory already present in your output directory.

  1. STORE MyCSVData INTO '/root/bigdataprogrammers/output_files/csv_to_pipe' using PigStorage('|');

Step 4: Show Output

Come out of grunt shell by pressing ctrl+z

Use below command to see the output

  1. cat root/bigdataprogrammers/output_files/csv_to_pipe/*

keep learning .


Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More
/ hive, hive interview, interview-qa

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More

Leave a Reply