About Code Many times it happens like you have received data from many systems and each system operates on a different kind of date format. But in the output you need to have a specific date format. Let’s say you are receiving date string like :- 12-01-2018 12:22:33 2018/12/01 12:22:33Read More →

Requirement You have marks of all the students of a class with roll number in CSV file, It is needed to calculate the percentage of each student using pig. Given: Download the sample CSV file  which have 7 columns, 1st column is Roll no and other 6 columns are subject1Read More →

 Requirement Assume that you want to load a file having timestamp values (yyyy-MM-dd HH:mm:ss) into pig. After loading into pig add one day into each timestamp value. Solution Please follow the below steps:- Step 1: Sample file Create a sample file named as timestamp_sample.txt file and put timestamp values inRead More →

Requirement Assume you have the XML file which is transferred to your local system by some other application. The file has customer’s data and it is needed to process this data using pig. But the challenge here is that file is not simple text or CSV file, it is theRead More →

Requirement Assume that you want to load file (which have pipe(|) separated values) in pig and output of pig should be comma(,) delimited and should be stored in one directory.   Solution Please follow the below steps:- Step 1: Sample file Create a sample file named as sample_1.txt file. PutRead More →

Requirement Assume that you want to load TSV(tab separated values) file in pig and output of pig should be pipe delimited and should be stored in one directory. Solution Please follow the below steps:- Step 1: Sample TSV file Create a sample TSV file named as sample_1.tsv file. Put contentRead More →

Requirement You have one Pig script which is expecting some variables which need to be passed from a shell script.Say name of pig scripts is daily_audit.pig .it is expecting three variables which are as follows ip_loc no_of_emp op_loc Solution Step 1: Let’s see content of daily_audit.pig     Daily_Audit =Read More →

Requirement: In source data, you have user’s information of mobile connection type and Id.You have four type of possible connection “POSTP, PREP, CLS, PEND” .But it is required to get Id of only those users whose connection type is in “POSTP, PREP, blank or null”. If the blank is presentRead More →

Requirement You have two table named as A and B. and you want to perform all types of join in pig Latin. It will help you to understand, how join works in pig. Solution Step 1: Input Files Download file  and  from here. And place them into a local directory. FileRead More →

Requirement Assume that you want to load CSV file in pig and output of pig should be pipe delimited and should be stored in one directory. Solution Please follow the below steps:- Step 1: Sample CSV file Create a sample CSV file named as sample_1.csv file. Put content in thatRead More →