Tutorials

  • Tiny Projects
  • Real Life Scenario
  • Free download
  • Requirement Based
  • Tested Code
  • Sample Data
  • Shortcuts
  • Quizzes
  • Optimized Code

All Tutorials ..

Load hive table into spark using Scala
Requirement Assume you have the hive table named as reports. It is required to process this dataset in spark. Once
Read more.
Calculate percentage in hive
Requirement You have marks of all the students of a class with roll number in CSV file, It is needed
Read more.
Load timestamp values from file into pig
 Requirement Assume that you want to load a file having timestamp values (yyyy-MM-dd HH:mm:ss) into pig. After loading into pig
Read more.
Load tsv file in pig
Requirement Assume that you want to load TSV(tab separated values) file in pig and output of pig should be pipe
Read more.
SUM in pig
Problem 1 Write a pig script to calculate the sum of profits earned by selling a particular product. Below is
Read more.
How to find duplicate value using Map Reduce
Requirement Suppose you get data files which are having user’s basic information like first name, last name, designation, city etc.
Read more.
Load hive table into pig
Requirement You have one table in hive, and it is needed to process the data of that hive table using
Read more.
Load JSON Data into Hive Partitioned table using PySpark
Requirement In the last post, we have demonstrated how to load JSON data in Hive non-partitioned table. This time having
Read more.
Data migration from Hive to HBase
Requirement Suppose we have data in Hive table. We want the same data into HBase table. So, our requirement is
Read more.
How to create spark application in IntelliJ
Requirement In spark-shell, it creates an instance of spark context as sc. Also, we don’t require to resolve dependency while
Read more.
Load csv file in pig
Requirement Assume that you want to load CSV file in pig and output of pig should be pipe delimited and
Read more.
Hive tips and shortcuts
#execute a query in Silent(S)Mode     hive -S -e "your hive query" #execute a query     hive -e
Read more.
Remove Header of CSV File in hive
Requirement You have one CSV file which is present at Hdfs location, and you want to create a hive layer
Read more.
Load Text file into Hive Table Using Spark
Requirement Suppose the source data is in a file. The file format is a text format. The requirement is to
Read more.
Load multi character delimited file into hive
Requirement You have a file which is delimited by multiple characters (%$) and you want to create a table in
Read more.
String to Date conversion in hive
Requirement: Generally we receive data from different sources which usually have different types of date formats. When we create a
Read more.
Load CSV file into hive ORC table
Requirement You have comma separated file and you want to create an ORC formatted table in hive on top of
Read more.
Load xml file in pig
Requirement Assume you have the XML file which is transferred to your local system by some other application. The file
Read more.
Import data from MySQL to HBase using Sqoop
Requirement Suppose, we have data in a table EMPLOYEE in MySQL database. We want to ingest same data in NoSQL
Read more.
How to find the number of records using Map Reduce
Requirement In real time scenario, data files contain many records. Also, there may be many data files available. In that
Read more.
How to calculate Rank in dataframe using python with example
Requirement : You have marks of all the students of class and you want to find ranks of students using
Read more.
Join in spark using scala with example
Requirement You have two table named as A and B. and you want to perform all types of join in
Read more.
Calculate percentage in spark using scala
Requirement You have marks of all the students of a class with roll number in CSV file, It is needed
Read more.
Load CSV file in hive
Requirement If you have comma separated file and you want to create a table in the hive on top of
Read more.
How to get partition record in Spark Using Scala
Requirement Suppose we are having a text format data file which contains employees basic details. When we load this file
Read more.
Calculate percentage using pig
Requirement You have marks of all the students of a class with roll number in CSV file, It is needed
Read more.
How to find duplicate record using Map Reduce
Requirement Suppose you have a data files which are having duplicate records i.e. a line of a file is occurring
Read more.
Load files into Hive Partitioned Table
Requirement There are two files which contain employee’s basic information. One file store employee’s details who have joined in the
Read more.
How to get distinct words of a file using Map Reduce
Requirement Suppose you have a file with full of contents. In this file, many words are repeatable. Now the requirement
Read more.
Find max value of a row in Hive
Requirement Suppose we are having some data in a hive table. The table contains information about company’s quarterly wise profit.
Read more.
Partitioning in Hive
Requirement Suppose there is a source data, which is required to store in hive partition table. So our requirement is
Read more.
Load pipe delimited file in pig
Requirement Assume that you want to load file (which have pipe(|) separated values) in pig and output of pig should
Read more.
Export hive data into file
Requirement You have one hive table named as infostore which is present in bdp schema.one more application is connected to
Read more.
Find max value in Spark RDD using Scala
Requirement Suppose we are having a source file, which contains basic information about Employees like employee number, employee name, designation,
Read more.
Transpose Data in Spark DataFrame using PySpark
Requirement Let’s take a scenario where we have already loaded data into an RDD/Dataframe. We got the rows data into
Read more.
Import CSV data into HBase
Requirement In this post, we are having data in a CSV file. This file contains basic information about Employees. We
Read more.
Split one column into multiple columns in hive
Requirement You have one table in hive with one column and you want to split this column into multiple columns
Read more.
Join in pyspark with example
Requirement You have two table named as A and B. and you want to perform all types of join in
Read more.
Pass variables from shell script to hive script
Requirement You have one hive script which is expecting some variables which need to be passed from a shell script.Say
Read more.
Join in hive with example
Requirement You have two table named as A and B. and you want to perform all types of join in
Read more.
Load JSON Data in Hive non-partitioned table using Spark
Requirement Suppose there is a source data which is in JSON format. The requirement is to load JSON data in
Read more.
Import data from MySQL into Hive using Sqoop
Requirement Suppose, there is a table named EMPLOYEE in MySQL database. We want this table data in Hadoop ecosystem. So,
Read more.
How to calculate Rank in dataframe using scala with example
Requirement : You have marks of all the students of class and you want to find ranks of students using
Read more.
Join in pig
Requirement You have two table named as A and B. and you want to perform all types of join in
Read more.
Drop multiple partitions in Hive
Requirement Suppose we are having a hive partition table. This table is partitioned by the year of joining. Our requirement
Read more.
__hive_default_partition__ in Hive
Requirement In this post, we are going to understand what is hive_default_partition in hive and why it gets created. Components
Read more.
Parse XML data in Hive
Requirement Suppose you are having an XML formatted data file. This source file contains some empty tag. The requirement is
Read more.
Filter records in pig
Requirement: In source data, you have user’s information of mobile connection type and Id.You have four type of possible connection
Read more.
Import data from RDBMS to Hadoop
Requirement In this post, we are going to import data from RDBMS to Hadoop. Here, we have MySQL as an
Read more.
Exclude Column(s) From Select Query in Hive
Requirement There is an uncertain number of columns present in the hive table. Sometimes a table can have many numbers
Read more.
Windowing Functions in Hive
Requirement In this post, we are going to explore windowing functions in Hive. These are the windowing functions: LEAD LAD
Read more.
Read CSV file in Spark Scala
Requirement Suppose we have a dataset which is in CSV format. We want to read the file in spark using
Read more.
Bucketing in Hive
Requirement In this post, we will go through the concept of Bucketing in Hive. This post will cover the below-following
Read more.
Analytics Functions in Hive
Requirement In this post, we are going to explore analytics functions in Hive. These are the following analytics function available
Read more.
Pass variables from shell script to pig script
Requirement You have one Pig script which is expecting some variables which need to be passed from a shell script.Say
Read more.
Java UDF to convert String to date in PIG
About Code Many times it happens like you have received data from many systems and each system operates on a
Read more.