big data

Write spark dataframe into Parquet files using scala

Tagged: apache spark, big data, hadoop, scala, Spark, spark dataframe, spark tutorials, Write spark dataframe into Parquet files using scala, write to parquet

Requirement: You have sample dataframe which you want to load in to parquet files using scala. Solution: Step 1: Sample Dataframe use below command: spark-shell Note: I am using spark 2.3 version. To Create a sample dataframe , Please refer Create-a-spark-dataframe-from-sample-data After following above post ,you can see thatRead More →

Read CSV File With New Line in Spark

Tagged: big data, csv file, data frame, multiline delimiter, Spark, spark dataframe

Requirement The CSV file is a very common source file to get data. Sometimes the issue occurs while processing this file. It can be because of multiple reasons. Here, in this post, we are going to discuss an issue – NEW LINE Character. In this demonstration, first, we will understandRead More →

Merge Multiple Data Frames in Spark

Tagged: big data, csv, merge, reduce, scala, seq, Spark, spark dataframe, union, union all

Requirement Let’s say we are getting data from multiple sources, but we need to ingest these data into a single target table. These data can have different schemas. We want to merge these data and load/save it into a table. Sample Data Emp_data1.csv empno,ename,designation,manager,hire_date,sal,deptno,location 9369,SMITH,CLERK,7902,12/17/1980,800,20,BANGALORE 9499,ALLEN,SALESMAN,7698,2/20/1981,1600,30,HYDERABAD 9521,WARD,SALESMAN,7698,2/22/1981,1250,30,PUNE 9566,TURNER,MANAGER,7839,4/2/1981,2975,20,MUMBAI 9654,MARTIN,SALESMAN,7698,9/28/1981,1250,30,CHENNAI 9369,SMITH,CLERK,7902,12/17/1980,800,20,KOLKATARead More →

Certifications

Tagged: ai, big data, certification, cloud, data engineer, data science, machine learning

Recommended certifications for data engineer and data scientists :- 1. Artificial Intelligence Click on below link to get started : IBM AI Engineering About the course (By IBM ): -The rapid pace of innovation in Artificial Intelligence (AI) is creating enormous opportunity for transforming entire industries and our veryRead More →

Export hive table to excel

Tagged: big data, excel, export, file, hive

Requirement You have one hive table named as infostore which is present in bdp schema. It is needed to get the data into Excel file. Solution. Let’s say the location where output file should present is /root/local_bdp/posts/export-hive-data-into-file Step 1: Create Output directory mkdir /root/local_bdp/posts/export-hive-data-into-file/output Step 2: Go to hive CLIRead More →

Scenario based interview questions on Big Data

Tagged: big data, hbase, hive, interview, qa, scenario based interview questions, scenario-based, Spark

1.There are 50 columns in one spark data frame say df.it is needed to cast all the columns into string. But to make the code more generic. It is not recommended to cast individual columns by writing column name.How would you achieve it in spark using scala? Answer : AsRead More →