spark with scala (Page 2)

Requirement The CSV file format is a very common file format used in many applications. Sometimes, it contains data with some additional behavior also. For example comma within the value, quotes, multiline, etc. In order to handle this additional behavior, spark provides options to handle it while processing the data.Read More →

Requirement: You have sample dataframe and you want to delete some columns from it.   Solution: Step 1: Sample Dataframe  use below command: spark-shell Note: I am using spark 2.3 version. To Create a sample dataframe , Please refer Create-a-spark-dataframe-from-sample-data After following above post ,you can see that students dataframeRead More →

Requirement: You have sample dataframe which you want to load in to parquet files using scala.   Solution: Step 1: Sample Dataframe  use below command: spark-shell Note: I am using spark 2.3 version. To Create a sample dataframe , Please refer Create-a-spark-dataframe-from-sample-data After following above post ,you can see thatRead More →

Requirement In this post, we will learn how to convert a table’s schema into a Data Frame in Spark. Sample Data empno ename designation manager hire_date sal deptno location 9369 SMITH CLERK 7902 12/17/1980 800 20 BANGALORE 9499 ALLEN SALESMAN 7698 2/20/1981 1600 30 HYDERABAD 9521 WARD SALESMAN 7698 2/22/1981Read More →

Requirement In this post, we will learn how to handle NULL in spark dataframe. There are multiple ways to handle NULL while data processing. We will see how can we do it in Spark DataFrame. Solution Create Dataframe with dummy data val df = spark.createDataFrame(Seq( (1100, “Person1”, “Location1”, null), (1200,Read More →

Requirement In this post, we will learn how to select a specific column value or all the columns in Spark DataFrame with different approaches. Sample Data empno ename designation manager hire_date sal deptno location 9369 SMITH CLERK 7902 12/17/1980 800 20 BANGALORE 9499 ALLEN SALESMAN 7698 2/20/1981 1600 30 HYDERABADRead More →

Requirement The UDF is a user-defined function. As its name indicate, a user can create a custom function and used it wherever required. We do create UDF when the existing build-in functions not available or not able to fulfill the requirement. Sample Data empno ename designation manager hire_date sal deptnoRead More →

Requirement Let’s say we are getting data from multiple sources, but we need to ingest these data into a single target table.  These data can have different schemas. We want to merge these data and load/save it into a table. Sample Data Emp_data1.csv empno,ename,designation,manager,hire_date,sal,deptno,location 9369,SMITH,CLERK,7902,12/17/1980,800,20,BANGALORE 9499,ALLEN,SALESMAN,7698,2/20/1981,1600,30,HYDERABAD 9521,WARD,SALESMAN,7698,2/22/1981,1250,30,PUNE 9566,TURNER,MANAGER,7839,4/2/1981,2975,20,MUMBAI 9654,MARTIN,SALESMAN,7698,9/28/1981,1250,30,CHENNAI 9369,SMITH,CLERK,7902,12/17/1980,800,20,KOLKATARead More →

Requirement : To Load application properties from text file. In this tutorial we will learn how to load properties or configs for any spark application. Solution : Step 1 : Preparation of required configs Many times parameters for any application vary from environment to environment. Or two run multiple instanceRead More →

Requirement Let’s say we are getting data from two different sources (i.e. RDBMS table and File), and we need to merge these data into a single dataframe. Both the source data having the same schema.  Sample Data MySQL Table Data: empno,ename,designation,manager,hire_date,sal,deptno 7369,SMITH,CLERK,7902,12/17/1980,800,20 7499,ALLEN,SALESMAN,7698,2/20/1981,1600,30 7521,WARD,SALESMAN,7698,2/22/1981,1250,30 7566,TURNER,MANAGER,7839,4/2/1981,2975,20 7654,MARTIN,SALESMAN,7698,9/28/1981,1250,30 CSV File Data: empno,ename,designation,manager,hire_date,sal,deptnoRead More →