Requirement You have one table in hive, and it is needed to process the data of that hive table using pig.To load data directly from file we generally use PigStorage(),but to load data from hive table we need different loading function. Let’s go into detail step by step. Solution StepRead More →

Requirement Suppose the source data is in a file. The file format is a text format. The requirement is to load the text file into hive table using Spark. In addition to this, read the data from the hive table using Spark. Therefore, let’s break the task into sub-task: LoadRead More →

Requirement In the last post, we have demonstrated how to load JSON data in Hive non-partitioned table. This time having the same sample JSON data. The requirement is to load JSON Data into Hive Partitioned table using Spark. The hive table will be partitioned by some column(s). The below taskRead More →

Requirement You have one hive script which is expecting some variables which need to be passed from a shell script.Say name of hive scripts is daily_audit.hql .it is expecting three variables which are as follows • schema • tablename • total_emp Solution Step 1: Let’s see content of daily_audit.hql daily_audit.hqlRead More →

Requirement: Generally we receive data from different sources which usually have different types of date formats. When we create a hive table on top of these data, it becomes necessary to convert them into date format which is supported by hive. Hive support yyyy-MM-dd date format. So output format ofRead More →

Requirement You have comma separated file and you want to create an ORC formatted table in hive on top of it, then please follow below mentioned steps. Solution Step 1: Sample CSV File Create a sample CSV file named as sample_1.csv file. download from here  (You can skip this stepRead More →

Requirement You have one hive table named as infostore which is present in bdp schema.one more application is connected to your application, but it is not allowed to take the data from hive table due to security reasons. And it is required to send the data of infostore table intoRead More →

Requirement Suppose there is a source data which is in JSON format. The requirement is to load JSON data in Hive non-partitioned table using Spark. Let’s break the requirement into two task: Load JSON data in spark data frame and read it Store it in a hive non-partition table ComponentsRead More →

Requirement Suppose there is a source data, which is required to store in hive partition table. So our requirement is to store the data in the hive table with static and dynamic partition. With an understanding of partition in the hive, we will see where to use the static andRead More →

Requirement Suppose you are having an XML formatted data file. This source file contains some empty tag. The requirement is to parse XML data in Hive and read data with handling some tag which is empty in the source data. Components Involved Hive Maven Java Solution There are many solutionsRead More →

Requirement You have one table in hive with one column and you want to split this column into multiple columns and store the results into another hive table. Solution Assume the name of hive table is “transact_tbl” and it has one column named as “connections”, and values in connections column areRead More →

Requirement If you have comma separated file and you want to create a table in the hive on top of it (need to load CSV file in hive). Solution Step 1: Sample CSV File Create a sample CSV file named as sample_1.csv file. download from here  (You can skip thisRead More →

Requirement You have one CSV file which is present at Hdfs location, and you want to create a hive layer on top of this data, but CSV file is having two headers on top of it, and you don’t want them to come into your hive table, so let’s solveRead More →