Requirement
You have one hive script which is expecting some variables. The variables need to be passed from a shell script. Say the name of hive script is daily_audit.hql. It is expecting three variables which are as follows:
• schema
• tablename
• total_emp
Solution
Step 1: Hive Script
Let’s see the content of daily_audit.hql script:
select * from ${hiveconf:schema}.${hiveconf:tablename} where total_emp>${hiveconf:no_of_employees};
I can say that three variables are required to be declared in a shell script. I recommend you to focus only on variables instead of logic in this HQL.
Step 2: Assignation of variables
Let’s declare these three variables in the shell script:
myschema=bdp mytablename=infostore noOftotal_emp=5000
You can change these variables when you need to get information from other schema or table. And that’s what the use of assigning variables in a shell script. In real life, these values are assigned from the output of another process.
Step 3: Call HQL
Once assignation is completed, now we can pass them while calling HQL.
So here is the command:
hive -f "/root/local_bdp/posts/Pass-variables-from-shell-script-to-hive-script/daily_audit.hql" --hiveconf schema=$myschema --hiveconf tablename=$mytablename --hiveconf no_of_employees=$noOftotal_emp;
In the above command, you can see that using “$” we are taking values of a particular variable and assigning it to the variable which is defined in HQL. For example
“Schema” is present in hive script but “myschema” is defined in a shell script. So you have to write like schema=$myschema where the first one is variable of hive script and the second one is variable of shell script. You must use -–hiveconf for each variable while calling a hive script.
Another Way
Instead of passing variable side by side, we can use parameter file which has all the variables.
Let’s have one file hiveparam.txt
set schema=bdp; set tablename=infostore; set no_of_employees=5000;
Define all variables using set command.
And while calling HQL simultaneously you should call this file.
Below is the command for the same:
hive -f "/root/local_bdp/posts/Pass-variables-from-shell-script-to-hive-script/daily_audit.hql" -i "/root/local_bdp/posts/Pass-variables-from-shell-script-to-hive-script/hiveparam.txt"
Use –i to call parameter file.
Wrapping Up
In real life projects, we use hive partitioned table. It may be partitioned on a date. As we process data daily, so every day one variable i.e. date needs to be changed and we can’t hard-code in HQL script, in that case, we can assign parameter(s) in a shell script.
Keep learning .
Don’t miss the tutorial on Top Big data courses on Udemy you should Buy