Pass variables from shell script to pig script

Pass variables from shell script to pig script

Requirement

You have one Pig script which is expecting some variables which need to be passed from a shell script.Say name of pig scripts is daily_audit.pig .it is expecting three variables which are as follows

  • ip_loc
  • no_of_emp
  • op_loc

Solution

Step 1:

Let’s see content of daily_audit.pig

 
 
  1. Daily_Audit = Load '${ip_loc}' using PigStorage(',') As (Company:chararray,empl:int);
  2. Audit = Filter Daily_Audit by empl>${no_of_emp};
  3. store Audit INTO '${op_loc}' using PigStorage(',');

 

I can say that three variable are required to be declared in a shell script. I recommended you to focus only on variables instead of logic in this Pig script.

Step 2: Assignation of variables

Let’s declare these three variables in shell script

 
 
Input_location=bigdataprogrammers/ip/
employees=3000
output_location=bigdataprogrammers/op/

You can change these variables when you need and that’s what the use of assigning variables in a shell script. In real time these values are assigned from the output of another process.

Step 3: Call Pig Script

Once assignation is complete now we can pass them while calling Pig Script.
So here is the command

 
 
  1. pig -f "hdfs://sandbox.hortonworks.com:8020/user/root/bigdataprogrammers/daily_audit.pig" -param ip_loc=$Input_location -param no_of_emp=$employees -param op_loc=$output_location

In above command you can see that using $ sign we are taking values of a particular variable and assigning it to the variable which is defined in Pig. For example
” ip_loc ” is present in Pig  script but ” Input_location ” is defined in a shell script. So you have to write like ip_loc=$Input_location where the first one is variable of Pig script and the second one is variable of shell script. You must use -param for each variable while calling pig script.

Another Way

Instead of passing variable side by side, we can use parameter file which has all the variables.
Let’s have one file parameters.txt

 
 
  1. ip_loc=bigdataprogrammers/ip/
  2. no_of_emp=3000
  3. op_loc=bigdataprogrammers/op_new/

Define all variables in it.
And while calling Pig script, simultaneously you should call this file.
Below is the command.

 
 
  1. pig -f "hdfs://sandbox.hortonworks.com:8020/user/root/bigdataprogrammers/daily_audit.pig" -param_file 'hdfs://sandbox.hortonworks.com:8020/user/root/bigdataprogrammers/parameters.txt'

Use – param_file to call parameter file.

Wrapping Up

In real time projects we use output and input location of data based on business date on which we are processing data. As we process data daily.so every day variables needs to be changed and we can’t hard-code in Pig script .in that case we can assign parameter(s) in a shell script.

Keep learning .

16
0

Join in hive with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in pyspark with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in spark using scala with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Java UDF to convert String to date in PIG

About Code Many times it happens like you have received data from many systems and each system operates on a ...
Read More
/ java udf, Pig, pig, pig udf, string to date, udf

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.