Requirement
You have one Pig script which is expecting some variables. The variables need to be passed from a shell script. Say the name of pig script is daily_audit.pig. It is expecting three variables which are as follows
- ip_loc
- no_of_emp
- op_loc
Solution
Step 1: Pig Script
Let’s see the content of daily_audit.pig
Daily_Audit = Load '${ip_loc}' using PigStorage(',') As (Company:chararray,empl:int); Audit = Filter Daily_Audit by empl>${no_of_emp}; store Audit INTO '${op_loc}' using PigStorage(',');
I can say that three variables are required to be declared in the shell script. I recommend you to focus only on variables instead of logic in this Pig script.
Step 2: Assignation of variables
Let’s declare these three variables in the shell script
Input_location=bigdataprogrammers/ip/ employees=3000 output_location=bigdataprogrammers/op/
You can change these variables when you need and that’s what the use of assigning variables in the shell script. In real life, these values are assigned from the output of another process.
Step 3: Call Pig Script
Once assignation is complete, now we can pass them while calling Pig Script.
So, here is the command:
pig -f "hdfs://sandbox.hortonworks.com:8020/user/root/bigdataprogrammers/daily_audit.pig" -param ip_loc=$Input_location -param no_of_emp=$employees -param op_loc=$output_location
In the above command, you can see that by using “$” we are taking values of a particular variable and assigning it to the variable which is defined in Pig. For example
” ip_loc ” is present in Pig script, but “Input_location” is defined in a shell script. So you have to write like ip_loc=$Input_location, where the first one is a variable of Pig script and the second one is a variable of shell script.
You must use -param for each variable while calling pig script.
Another Way
Instead of passing variable side by side, we can use parameter file which has all the variables.
Let’s have one file parameters.txt
ip_loc=bigdataprogrammers/ip/ no_of_emp=3000 op_loc=bigdataprogrammers/op_new/
Define all variables in it.
And while calling Pig script, simultaneously you should call this file.
Below is the command:
pig -f "hdfs://sandbox.hortonworks.com:8020/user/root/bigdataprogrammers/daily_audit.pig" -param_file 'hdfs://sandbox.hortonworks.com:8020/user/root/bigdataprogrammers/parameters.txt'
Use – param_file to call parameter file.
Wrapping Up
In real life projects, we use the output and input location of data based on the business date on which we are processing data. As we process data daily, so every day one variable i.e. date needs to be changed and we can’t hard-code in Pig script, in that case, we can assign parameter(s) in a shell script.
Don’t miss the tutorial on Top Big data courses on Udemy you should Buy