Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then follow below mentioned steps. Solution Step 1: Sample CSV File Create a sample CSV file named as sample_1.csv file. download from here (You can skip this step if you alreadyRead More →

Requirement You have two table named as A and B. and you want to perform all types of join in hive . It will help you to understand, how join works in hive. Solution Step 1: Input Files Download file  and  from here. And place them into a local directory. FileRead More →

Requirement There is an uncertain number of columns present in the hive table. Sometimes a table can have many numbers of columns and sometimes it can have few numbers of columns. If we want the value of all the columns from the table, then there is no any challenge asRead More →

Requirement In this post, we are going to understand what is hive_default_partition in hive and why it gets created. Components Involved HDFS HIVE Sample Data We will use below sample data for the task. This is the sample data of employee details. Some employees are the member of company’s sportsRead More →

Requirement Suppose we are having a hive partition table. This table is partitioned by the year of joining. Our requirement is to drop multiple partitions in hive. Components Involved Hive HDFS Sample Data Let’s say we are having given sample data: Here, 1 record belongs to 1 partition as weRead More →

Requirement There are two files which contain employee’s basic information. One file store employee’s details who have joined in the year of 2012 and another is for the employees who have joined in the year of  2013. Now, we want to load files into hive partitioned table which is partitionedRead More →

Requirement Suppose we are having some data in a hive table. The table contains information about company’s quarterly wise profit. Now, the requirement is to find max profit of each company from all quarters. Sample Data The record having 5 columns – company name, quarter 1 as Q1, quarter 2Read More →

Requirement Suppose we have data in Hive table. We want the same data into HBase table. So, our requirement is to migrate the data from Hive to HBase table. Components Involved Hive – Source table HBase – Target Table Solution We cannot load data directly into HBase table from theRead More →

Requirement Suppose, there is a table named EMPLOYEE in MySQL database. We want this table data in Hadoop ecosystem. So, the requirement is to import data from MySQL into Hive using Sqoop. Once data is available in Hive, we can process it. Components Involved In order to achieve the requirement,Read More →

Requirement You have marks of all the students of a class with roll number in CSV file, It is needed to calculate the percentage of each student in hive. Given: Download the sample CSV file  which have 7 columns, 1st column is Roll no and other 6 columns are subject1Read More →

Requirement You have one table in hive, and it is needed to process the data of that hive table using pig.To load data directly from file we generally use PigStorage(),but to load data from hive table we need different loading function. Let’s go into detail step by step. Solution StepRead More →

Requirement Assume you have the hive table named as reports. It is required to process this dataset in spark. Once we have data of hive table in spark data frame we can further transform it as per the business needs. So let’s try to load hive table in spark dataRead More →

Requirement Suppose the source data is in a file. The file format is a text format. The requirement is to load the text file into hive table using Spark. In addition to this, read the data from the hive table using Spark. Therefore, let’s break the task into sub-task: LoadRead More →