spark sql

In this post, we will explore how to use Apache Spark with MongoDB, combining the power of Spark’s distributed processing capabilities with MongoDB’s flexible and scalable NoSQL database. Spark’s integration with MongoDB allows us to efficiently read and write data to/from MongoDB using Spark’s powerful APIs and perform data processingRead More →

In this post, we will explore how to optimize Spark SQL queries to improve their performance. Spark SQL offers various techniques and optimizations to enhance query execution and minimize resource usage. Problem We want to improve the performance of Spark SQL queries by implementing optimization techniques and best practices. SolutionRead More →

Requirement In our previous post, we learned about the Temporary view in Databricks. In this post, we are going to learn about Global View in Databricks or can say in Spark. We will also see when to create a temporary view and how to access it. Solution For this exercise,Read More →

Requirement In this post, we will learn how to get last element in list of dataframe in spark. Solution Create a dataframe with dummy data: val df = spark.createDataFrame(Seq( (“1100”, “Person1”, “Street1#Location1#City1”, null), (“1200”, “Person2”, “Street2#Location2#City2”, “Contact2”), (“1300”, “Person3”, “Street3#Location3#City3”, null), (“1400”, “Person4”, null, “Contact4”), (“1500”, “Person5”, “Street5#Location5#City5”, null) )).toDF(“id”,Read More →

Requirement In this post, we are going to learn about how to compare data frames data in Spark. Let’s see a scenario where your daily job consumes data from the source system and append it into the target table as it is a Delta/Incremental load. There is a possibility toRead More →

Requirement Let’s say we are getting data from two different sources (i.e. RDBMS table and File), and we need to merge these data into a single dataframe. Both the source data having the same schema.  Sample Data MySQL Table Data: empno,ename,designation,manager,hire_date,sal,deptno 7369,SMITH,CLERK,7902,12/17/1980,800,20 7499,ALLEN,SALESMAN,7698,2/20/1981,1600,30 7521,WARD,SALESMAN,7698,2/22/1981,1250,30 7566,TURNER,MANAGER,7839,4/2/1981,2975,20 7654,MARTIN,SALESMAN,7698,9/28/1981,1250,30 CSV File Data: empno,ename,designation,manager,hire_date,sal,deptnoRead More →