Load tsv file in pig

Requirement

Assume that you want to load TSV(tab separated values) file in pig and store the output delimited by a pipe (‘|’).

Solution

Follow the below steps:

Step 1: Sample TSV file

Create a sample TSV file named as sample_1.tsv file.

If you have any sample data with you, then put the content in that file with delimiter tab (\t). If you have created a file in windows, then transfer it to your Linux machine via WinSCP.

You can get the file from here sample_1

I have a local directory as /root/bigdataprogrammers/input_files, so I have placed a sample_1.tsv file in that directory.

You can see the content of that file using below command in the shell:

 cat /root/bigdataprogrammers/input_files/sample_1.tsv

It will show the file content:

Step 2: Load Data in Pig

Now, I will load the file in pig, because the file is present in local, so here I prefer pig in local mode. In real life, pig MapReduce mode works.

Enter a below-mentioned command in terminal:

  pig -x local

It will take you to the grunt shell. Use the below-mentioned command in grunt shell:

 MyTSVData = LOAD '/root/bigdataprogrammers/input_files/sample_1.tsv' using PigStorage('\t') AS (id:chararray,code:chararray);

You can see whether the file is loaded or not using below command:-

  dump MyTSVData

It will show the content of relation MyTSVData.

Step 3: Store output  data in Pig

You can store this relation(MyTSVData) using STORE command in pig, assume that you want output file in pipe delimited format.

Enter below command to store the relation into pipe delimited file. It will be stored in a tsv_to_pipe directory which will be created dynamically, make sure that you do not have the same directory already present in your output directory.

 STORE MyTSVData INTO '/root/bigdataprogrammers/output_files/tsv_to_pipe' using PigStorage('|');

Step 4: Show Output

Come out of grunt shell by pressing ctrl+c

Use below command to see the output

  cat root/bigdataprogrammers/output_files/tsv_to_pipe/*

keep learning.

Don’t miss the tutorial on Top Big data courses on Udemy you should Buy

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply