
Requirement The CSV file format is a very common file format used in many applications. Sometimes, it contains data with some additional behavior also. For example comma within the value, quotes, multiline, etc. In order to handle this additional behavior, spark provides options to handle it while processing the data.Read More →

Requirement Let’s say we are getting data from multiple sources, but we need to ingest these data into a single target table.  These data can have different schemas. We want to merge these data and load/save it into a table. Sample Data Emp_data1.csv empno,ename,designation,manager,hire_date,sal,deptno,location 9369,SMITH,CLERK,7902,12/17/1980,800,20,BANGALORE 9499,ALLEN,SALESMAN,7698,2/20/1981,1600,30,HYDERABAD 9521,WARD,SALESMAN,7698,2/22/1981,1250,30,PUNE 9566,TURNER,MANAGER,7839,4/2/1981,2975,20,MUMBAI 9654,MARTIN,SALESMAN,7698,9/28/1981,1250,30,CHENNAI 9369,SMITH,CLERK,7902,12/17/1980,800,20,KOLKATARead More →

Requirement In this post, we are having data in a CSV file. This file contains basic information about Employees. We want to import CSV data into HBase table. Components Involved HDFS HBASE Sample Data Our sample data looks like below: 7369,SMITH,CLERK,7902,12/17/1980,800,20 7499,ALLEN,SALESMAN,7698,2/20/1981,1600,30 7521,WARD,SALESMAN,7698,2/22/1981,1250,30 7566,TURNER,MANAGER,7839,4/2/1981,2975,20 7654,MARTIN,SALESMAN,7698,9/28/1981,1250,30 7698,MILLER,MANAGER,7839,5/1/1981,2850,30 7782,CLARK,MANAGER,7839,6/9/1981,2450,10 7788,SCOTT,ANALYST,7566,12/9/1982,3000,20 7839,KING,PRESIDENT,NULL,11/17/1981,5000,10 7844,TURNER,SALESMAN,7698,9/8/1981,1500,30Read More →

Requirement You have one CSV file which is present at Hdfs location, and you want to create a hive layer on top of this data, but CSV file is having two headers on top of it, and you don’t want them to come into your hive table, so let’s solveRead More →