spark dataframe

In this post, we will explore how to use Spark with Cassandra, combining the benefits of Spark’s distributed processing capabilities with Cassandra’s scalable and fault-tolerant NoSQL database. Spark’s integration with Cassandra allows us to efficiently read and write data to/from Cassandra using Spark’s powerful APIs and perform data processing andRead More →

Requirement In this post, we will learn how to get last element in list of dataframe in spark. Solution Create a dataframe with dummy data: val df = spark.createDataFrame(Seq( (“1100”, “Person1”, “Street1#Location1#City1”, null), (“1200”, “Person2”, “Street2#Location2#City2”, “Contact2”), (“1300”, “Person3”, “Street3#Location3#City3”, null), (“1400”, “Person4”, null, “Contact4”), (“1500”, “Person5”, “Street5#Location5#City5”, null) )).toDF(“id”,Read More →

Requirement In this post, we will learn how to get or extract a value from a row. Whenever we extract a value from a row of a column, we get an object as a result. For example, if we have a data frame with personal details like id, name, location,Read More →

Requirement In this post, we are going to learn about how to compare data frames data in Spark. Let’s see a scenario where your daily job consumes data from the source system and append it into the target table as it is a Delta/Incremental load. There is a possibility toRead More →

Requirement: You have sample dataframe and you want to delete some columns from it.   Solution: Step 1: Sample Dataframe  use below command: spark-shell Note: I am using spark 2.3 version. To Create a sample dataframe , Please refer Create-a-spark-dataframe-from-sample-data After following above post ,you can see that students dataframeRead More →

Requirement: You have sample dataframe which you want to load in to parquet files using scala.   Solution: Step 1: Sample Dataframe  use below command: spark-shell Note: I am using spark 2.3 version. To Create a sample dataframe , Please refer Create-a-spark-dataframe-from-sample-data After following above post ,you can see thatRead More →

Requirement Let’s say we are getting data from multiple sources, but we need to ingest these data into a single target table.  These data can have different schemas. We want to merge these data and load/save it into a table. Sample Data Emp_data1.csv empno,ename,designation,manager,hire_date,sal,deptno,location 9369,SMITH,CLERK,7902,12/17/1980,800,20,BANGALORE 9499,ALLEN,SALESMAN,7698,2/20/1981,1600,30,HYDERABAD 9521,WARD,SALESMAN,7698,2/22/1981,1250,30,PUNE 9566,TURNER,MANAGER,7839,4/2/1981,2975,20,MUMBAI 9654,MARTIN,SALESMAN,7698,9/28/1981,1250,30,CHENNAI 9369,SMITH,CLERK,7902,12/17/1980,800,20,KOLKATARead More →