Pass variables from shell script to pig script

Requirement

You have one Pig script which is expecting some variables. The variables need to be passed from a shell script. Say the name of pig script is daily_audit.pig. It is expecting three variables which are as follows

  • ip_loc
  • no_of_emp
  • op_loc

Solution

Step 1: Pig Script

Let’s see the content of daily_audit.pig

Daily_Audit = Load '${ip_loc}' using PigStorage(',') As (Company:chararray,empl:int);
Audit = Filter Daily_Audit by empl>${no_of_emp};
store Audit INTO '${op_loc}' using PigStorage(',');

I can say that three variables are required to be declared in the shell script. I recommend you to focus only on variables instead of logic in this Pig script.

Step 2: Assignation of variables

Let’s declare these three variables in the shell script

Input_location=bigdataprogrammers/ip/
employees=3000
output_location=bigdataprogrammers/op/

You can change these variables when you need and that’s what the use of assigning variables in the shell script. In real life, these values are assigned from the output of another process.

Step 3: Call Pig Script

Once assignation is complete, now we can pass them while calling Pig Script.
So, here is the command:

 pig -f "hdfs://sandbox.hortonworks.com:8020/user/root/bigdataprogrammers/daily_audit.pig" -param ip_loc=$Input_location -param no_of_emp=$employees -param op_loc=$output_location

In the above command, you can see that by using “$” we are taking values of a particular variable and assigning it to the variable which is defined in Pig. For example
” ip_loc ” is present in Pig script, but “Input_location” is defined in a shell script. So you have to write like ip_loc=$Input_location, where the first one is a variable of Pig script and the second one is a variable of shell script.

You must use -param for each variable while calling pig script.

Another Way

Instead of passing variable side by side, we can use parameter file which has all the variables.
Let’s have one file parameters.txt

ip_loc=bigdataprogrammers/ip/
no_of_emp=3000
op_loc=bigdataprogrammers/op_new/

Define all variables in it.
And while calling Pig script, simultaneously you should call this file.
Below is the command:

 pig -f "hdfs://sandbox.hortonworks.com:8020/user/root/bigdataprogrammers/daily_audit.pig" -param_file 'hdfs://sandbox.hortonworks.com:8020/user/root/bigdataprogrammers/parameters.txt'

Use – param_file to call parameter file.

Wrapping Up

In real life projects, we use the output and input location of data based on the business date on which we are processing data. As we process data daily, so every day one variable i.e. date needs to be changed and we can’t hard-code in Pig script, in that case, we can assign parameter(s) in a shell script.

Don’t miss the tutorial on Top Big data courses on Udemy you should Buy

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply