Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More

Hive Most Asked Interview Questions With Answers – Part I

What is Hive and why it is useful? Hive is a data warehouse application where data gets stored in the ...
Read More

Create a spark dataframe from sample data

Requirement: You have sample data of some students and you want to create a dataframe to perform some operations. Given: ...
Read More

Load spark dataframe into non existing hive table

Requirement: You have a dataframe which you want to save into hive table for future use. But you do not ...
Read More

How to add new column in Spark Dataframe

Requirement When we ingest data from source to Hadoop data lake, we used to add some additional columns with the ...
Read More

How to read JSON file in Spark

Requirement Let’s say we have a set of data which is in JSON format. The file may contain data either ...
Read More

How to execute Scala script in Spark without creating Jar

Requirement The spark-shell is an environment where we can run the spark scala code and see the output on the ...
Read More

Join in hive with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in pyspark with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Join in spark using scala with example

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Java UDF to convert String to date in PIG

About Code Many times it happens like you have received data from many systems and each system operates on a ...
Read More

Hive tips and shortcuts

#execute a query in Silent(S)Mode hive -S -e "your hive query" #execute a query hive -e "your hive query" #show ...
Read More

How to calculate Rank in dataframe using python with example

Requirement : You have marks of all the students of class and you want to find ranks of students using ...
Read More

How to calculate Rank in dataframe using scala with example

Requirement : You have marks of all the students of class and you want to find ranks of students using ...
Read More

Windowing Functions in Hive

Requirement In this post, we are going to explore windowing functions in Hive. These are the windowing functions: LEAD LAD ...
Read More

Analytics Functions in Hive

Requirement In this post, we are going to explore analytics functions in Hive. These are the following analytics function available ...
Read More

Bucketing in Hive

Requirement In this post, we will go through the concept of Bucketing in Hive. This post will cover the below-following ...
Read More

Exclude Column(s) From Select Query in Hive

Requirement There is an uncertain number of columns present in the hive table. Sometimes a table can have many numbers ...
Read More

__hive_default_partition__ in Hive

Requirement In this post, we are going to understand what is hive_default_partition in hive and why it gets created. Components ...
Read More

Drop multiple partitions in Hive

Requirement Suppose we are having a hive partition table. This table is partitioned by the year of joining. Our requirement ...
Read More

Load files into Hive Partitioned Table

Requirement There are two files which contain employee's basic information. One file store employee's details who have joined in the ...
Read More

Find max value of a row in Hive

Requirement Suppose we are having some data in a hive table. The table contains information about company's quarterly wise profit ...
Read More

Transpose Data in Spark DataFrame using PySpark

Requirement Let's take a scenario where we have already loaded data into an RDD/Dataframe. We got the rows data into ...
Read More

Import data from RDBMS to Hadoop

Requirement In this post, we are going to import data from RDBMS to Hadoop. Here, we have MySQL as an ...
Read More

Import CSV data into HBase

Requirement In this post, we are having data in a CSV file. This file contains basic information about Employees. We ...
Read More

Data migration from Hive to HBase

Requirement Suppose we have data in Hive table. We want the same data into HBase table. So, our requirement is ...
Read More

Import data from MySQL to HBase using Sqoop

Requirement Suppose, we have data in a table EMPLOYEE in MySQL database. We want to ingest same data in NoSQL ...
Read More

Import data from MySQL into Hive using Sqoop

Requirement Suppose, there is a table named EMPLOYEE in MySQL database. We want this table data in Hadoop ecosystem. So, ...
Read More

Read CSV file in Spark Scala

Requirement Suppose we have a dataset which is in CSV format. We want to read the file in spark using ...
Read More

Find max value in Spark RDD using Scala

Requirement Suppose we are having a source file, which contains basic information about Employees like employee number, employee name, designation, ...
Read More

How to get partition record in Spark Using Scala

Requirement Suppose we are having a text format data file which contains employees basic details. When we load this file ...
Read More

How to create spark application in IntelliJ

Requirement In spark-shell, it creates an instance of spark context as sc. Also, we don't require to resolve dependency while ...
Read More

Calculate percentage in hive

Requirement You have marks of all the students of a class with roll number in CSV file, It is needed ...
Read More

Calculate percentage using pig

Requirement You have marks of all the students of a class with roll number in CSV file, It is needed ...
Read More

Calculate percentage in spark using scala

Requirement You have marks of all the students of a class with roll number in CSV file, It is needed ...
Read More

How to find duplicate record using Map Reduce

Requirement Suppose you have a data files which are having duplicate records i.e. a line of a file is occurring ...
Read More

Load multi character delimited file into hive

Requirement You have a file which is delimited by multiple characters (%$) and you want to create a table in ...
Read More

How to find the number of records using Map Reduce

Requirement In real time scenario, data files contain many records. Also, there may be many data files available. In that ...
Read More

Load hive table into pig

Requirement You have one table in hive, and it is needed to process the data of that hive table using ...
Read More

Load timestamp values from file into pig

 Requirement Assume that you want to load a file having timestamp values (yyyy-MM-dd HH:mm:ss) into pig. After loading into pig ...
Read More

How to find duplicate value using Map Reduce

Requirement Suppose you get data files which are having user's basic information like first name, last name, designation, city etc ...
Read More

How to get distinct words of a file using Map Reduce

Requirement Suppose you have a file with full of contents. In this file, many words are repeatable. Now the requirement ...
Read More

SUM in pig

Problem 1 Write a pig script to calculate the sum of profits earned by selling a particular product. Below is ...
Read More

Load hive table into spark using Scala

Requirement Assume you have the hive table named as reports. It is required to process this dataset in spark. Once ...
Read More

Load xml file in pig

Requirement Assume you have the XML file which is transferred to your local system by some other application. The file ...
Read More

Load pipe delimited file in pig

Requirement Assume that you want to load file (which have pipe(|) separated values) in pig and output of pig should ...
Read More

Load tsv file in pig

Requirement Assume that you want to load TSV(tab separated values) file in pig and output of pig should be pipe ...
Read More

Pass variables from shell script to pig script

Requirement You have one Pig script which is expecting some variables which need to be passed from a shell script.Say ...
Read More

Load Text file into Hive Table Using Spark

Requirement Suppose the source data is in a file. The file format is a text format. The requirement is to ...
Read More

Load JSON Data into Hive Partitioned table using PySpark

Requirement In the last post, we have demonstrated how to load JSON data in Hive non-partitioned table. This time having ...
Read More

Pass variables from shell script to hive script

Requirement You have one hive script which is expecting some variables which need to be passed from a shell script.Say ...
Read More

Filter records in pig

Requirement: In source data, you have user's information of mobile connection type and Id.You have four type of possible connection ...
Read More

String to Date conversion in hive

Requirement: Generally we receive data from different sources which usually have different types of date formats. When we create a ...
Read More

Join in pig

Requirement You have two table named as A and B. and you want to perform all types of join in ...
Read More

Load CSV file into hive ORC table

Requirement You have comma separated file and you want to create an ORC formatted table in hive on top of ...
Read More

Export hive data into file

Requirement You have one hive table named as infostore which is present in bdp schema.one more application is connected to ...
Read More

Load JSON Data in Hive non-partitioned table using Spark

Requirement Suppose there is a source data which is in JSON format. The requirement is to load JSON data in ...
Read More

Partitioning in Hive

Requirement Suppose there is a source data, which is required to store in hive partition table. So our requirement is ...
Read More

Parse XML data in Hive

Requirement Suppose you are having an XML formatted data file. This source file contains some empty tag. The requirement is ...
Read More

Split one column into multiple columns in hive

Requirement You have one table in hive with one column and you want to split this column into multiple columns ...
Read More

Load csv file in pig

Requirement Assume that you want to load CSV file in pig and output of pig should be pipe delimited and ...
Read More

Load CSV file in hive

Requirement If you have comma separated file and you want to create a table in the hive on top of ...
Read More

Remove Header of CSV File in hive

Requirement You have one CSV file which is present at Hdfs location, and you want to create a hive layer ...
Read More