Requirement In this post, we are having data in a CSV file. This file contains basic information about Employees. We want to import CSV data into HBase table. Components Involved HDFS HBASE Sample Data Our sample data looks like below: sample-data   7369,SMITH,CLERK,7902,12/17/1980,800,20 7499,ALLEN,SALESMAN,7698,2/20/1981,1600,30 7521,WARD,SALESMAN,7698,2/22/1981,1250,30 7566,TURNER,MANAGER,7839,4/2/1981,2975,20 7654,MARTIN,SALESMAN,7698,9/28/1981,1250,30 7698,MILLER,MANAGER,7839,5/1/1981,2850,30 7782,CLARK,MANAGER,7839,6/9/1981,2450,10 7788,SCOTT,ANALYST,7566,12/9/1982,3000,20Read More →

Requirement Suppose we have a dataset which is in CSV format. We want to read the file in spark using Scala. So the requirement is to create a spark application which read CSV file in spark data frame using Scala. Components Involved Following components are involved: Spark RDD/Data Frame ScalaRead More →

Requirement You have marks of all the students of a class with roll number in CSV file, It is needed to calculate the percentage of each student in hive. Given: Download the sample CSV file  which have 7 columns, 1st column is Roll no and other 6 columns are subject1Read More →

Requirement You have one CSV file which is present at Hdfs location, and you want to create a hive layer on top of this data, but CSV file is having two headers on top of it, and you don’t want them to come into your hive table, so let’s solveRead More →