Requirement
Assume that you want to load CSV file in pig and store the output delimited by a pipe (‘|’).
Solution
Please follow the below steps:-
Step 1: Sample CSV file
Create a sample CSV file named as sample_1.csv.
If you have any sample data with you, then put the content in that file with delimiter comma (,). If you have created a file in windows, then transfer it to your Linux machine via WinSCP.
You can get the file from here sample_1
I have a local directory as /root/bigdataprogrammers/input_files, so I have placed a sample_1.csv file in that directory.
You can see content of that file using below command in shell:
cat /root/bigdataprogrammers/input_files/sample_1.csv
It will show the content of file:
Step 2: Load Data in Pig
Now, I will load the file in pig. As the file is present in local, so here I prefer pig in local mode. In real life, pig MapReduce mode works.
Enter a below-mentioned command in terminal:
pig -x local
It will take you to the grunt shell. Use the below-mentioned command in grunt shell:
MyCSVData = LOAD '/root/bigdataprogrammers/input_files/sample_1.csv' using PigStorage(',') AS (id:chararray,code:chararray);
You can see whether the file is loaded or not using below command:-
dump MyCSVData
It will show the content of relation MyCSVData.
Step 3: Store output data in Pig
You can store this relation using STORE command in pig, assume that you want output file into pipe delimited format.
Enter below command to store the relation into pipe delimited file. It will be stored in a csv_to_pipe directory which will be created dynamically, make sure that you do not have the same directory already present in your output directory.
STORE MyCSVData INTO '/root/bigdataprogrammers/output_files/csv_to_pipe' using PigStorage('|');
Step 4: Show Output
Come out of grunt shell by pressing ctrl+c
Use below command to see the output:
cat root/bigdataprogrammers/output_files/csv_to_pipe/*
Wrapping Up
CSV is the most used file format. It stores data as comma-separated values that’s why we have used a ‘,’ delimiter in “PigStorage” option while loading the file.
Don’t miss the tutorial on Top Big data courses on Udemy you should Buy