April 2017

Requirement You have one hive script which is expecting some variables. The variables need to be passed from a shell script. Say the name of hive script is daily_audit.hql. It is expecting three variables which are as follows: • schema • tablename • total_emp Solution Step 1: Hive Script Let’sRead More →

Requirement: In source data, you have user’s information of mobile connection type and Id. You have four types of possible connection “POSTP, PREP, CLS, PEND”. But it is required to get Id of only those users whose connection type is in “POSTP, PREP, blank or null”. If the blank isRead More →

Requirement: Generally we receive data from different sources which usually have different types of date formats. When we create a hive table on top of these data, it becomes necessary to convert them into date format which is supported by hive. Hive support yyyy-MM-dd date format. So output format ofRead More →

Requirement You have two tables named as A and B and you want to perform all types of join in Pig. It will help you to understand, how join works in pig. Solution Step 1: Input Files Download file  and  from here. And place them into a local directory. File ARead More →

Requirement You have a comma separated file and you want to create an ORC formatted table in hive on top of it, then follow the below-mentioned steps. Solution Step 1: Sample CSV File Create a sample CSV file named as sample_1.csv file. Download from here (You can skip this stepRead More →

Requirement You have one hive table named as infostore which is present in bdp schema. One more application is connected to your application, but it is not allowed to take the data from hive table due to security reasons. It is required to send the data of infostore table intoRead More →