Load pipe delimited file in pig

Load pipe delimited file in pig

Requirement

Assume that you want to load file (which have pipe(|) separated values) in pig and output of pig should be comma(,) delimited and should be stored in one directory.

 

Solution

Please follow the below steps:-

Step 1: Sample file

Create a sample file named as sample_1.txt file.

Put content in that file, delimited by a pipe(|). If you have created a file in windows then transfer it to your Linux machine via WinSCP.

You can get the file from here sample_1

I have a local directory as /root/bigdataprogrammers/input_files, so I have placed a sample_1.txt file in that directory.

You can see content of that file using below command in shell

 
 
cat /root/bigdataprogrammers/input_files/sample_1.txt

It will show the file content:-

Step 2: Load Data in Pig

Now I will load the file in pig, because the file is present in local, so here I prefer pig in local mode. In real time pig MapReduce mode works.

Enter a below-mentioned command in putty

 
 
 pig -x local

It will take you to the grunt shell.Type the below-mentioned command in grunt shell.

 
 
  1. MyPSVData = LOAD '/root/bigdataprogrammers/input_files/sample_1.txt' using PigStorage('|') AS (id:chararray,code:chararray);

You can see whether the file is loaded or not using below command:-

 
 
  1. dump MyPSVData

It will show the content of relation MyPSVData.

Step 3: Store output  data in Pig

You can store this relation using STORE command in pig, assume that you want output file into comma delimited format.

Enter below command to store the relation into comma delimited file. It will be stored in a psv_to_comma directory which will be created dynamically, make sure that you do not have the same directory already present in your output directory.

 
 
  1. STORE MyPSVData INTO '/root/bigdataprogrammers/output_files/psv_to_comma' using PigStorage(',');

Step 4: Show Output

Come out of grunt shell by pressing ctrl+z

Use below command to see the output

 
 
  1. cat root/bigdataprogrammers/output_files/psv_to_comma/*

 

keep learning.

 

34
0

Join in hive with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in pyspark with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in spark using scala with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Java UDF to convert String to date in PIG

About Code Many times it happens like you have received data from many systems and each system operates on a ...
Read More
/ java udf, Pig, pig, pig udf, string to date, udf

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.