Pass variables from shell script to hive script

Requirement

You have one hive script which is expecting some variables. The variables need to be passed from a shell script. Say the name of hive script is daily_audit.hql. It is expecting three variables which are as follows:
• schema
• tablename
• total_emp

Solution

Step 1: Hive Script

Let’s see the content of daily_audit.hql script:

select * from ${hiveconf:schema}.${hiveconf:tablename} where total_emp>${hiveconf:no_of_employees};

I can say that three variables are required to be declared in a shell script. I recommend you to focus only on variables instead of logic in this HQL.

Step 2: Assignation of variables

Let’s declare these three variables in the shell script:

myschema=bdp
mytablename=infostore
noOftotal_emp=5000

You can change these variables when you need to get information from other schema or table. And that’s what the use of assigning variables in a shell script. In real life, these values are assigned from the output of another process.

Step 3: Call HQL

Once assignation is completed, now we can pass them while calling HQL.
So here is the command:

hive -f "/root/local_bdp/posts/Pass-variables-from-shell-script-to-hive-script/daily_audit.hql" --hiveconf schema=$myschema --hiveconf tablename=$mytablename --hiveconf no_of_employees=$noOftotal_emp;

In the above command, you can see that using “$” we are taking values of a particular variable and assigning it to the variable which is defined in HQL. For example
“Schema” is present in hive script but “myschema” is defined in a shell script. So you have to write like schema=$myschema where the first one is variable of hive script and the second one is variable of shell script. You must use -–hiveconf for each variable while calling a hive script.

Another Way

Instead of passing variable side by side, we can use parameter file which has all the variables.
Let’s have one file hiveparam.txt

set schema=bdp;
set tablename=infostore;
set no_of_employees=5000;

Define all variables using set command.
And while calling HQL simultaneously you should call this file.
Below is the command for the same:

hive -f "/root/local_bdp/posts/Pass-variables-from-shell-script-to-hive-script/daily_audit.hql" -i "/root/local_bdp/posts/Pass-variables-from-shell-script-to-hive-script/hiveparam.txt"

Use –i to call parameter file.

Wrapping Up

In real life projects, we use hive partitioned table. It may be partitioned on a date. As we process data daily, so every day one variable i.e. date needs to be changed and we can’t hard-code in HQL script, in that case, we can assign parameter(s) in a shell script.

Keep learning .

Don’t miss the tutorial on Top Big data courses on Udemy you should Buy

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply