Spark (Page 3)

Requirement Let’s say we have a data file with a TSV extension. It is the same as the CSV file. What is the difference between CSV and TSV? The difference is separating the data in the file The CSV file stores data separated by “,”, whereas TSV stores data separatedRead More →

In this post, we will walkthrough of the Databricks Notebook. This is the code base area in Databricks. Overview The Databricks Notebook is a kind of document which keeps all the commands, visualization in a cell. You can either create a cell for the entire code or can keep anRead More →

Requirement In this post, we will learn how to get last element in list of dataframe in spark. Solution Create a dataframe with dummy data: val df = spark.createDataFrame(Seq( (“1100”, “Person1”, “Street1#Location1#City1”, null), (“1200”, “Person2”, “Street2#Location2#City2”, “Contact2”), (“1300”, “Person3”, “Street3#Location3#City3”, null), (“1400”, “Person4”, null, “Contact4”), (“1500”, “Person5”, “Street5#Location5#City5”, null) )).toDF(“id”,Read More →

Requirement In this post, we will learn how to get or extract a value from a row. Whenever we extract a value from a row of a column, we get an object as a result. For example, if we have a data frame with personal details like id, name, location,Read More →

Requirement In this post, we are going to learn about how to compare data frames data in Spark. Let’s see a scenario where your daily job consumes data from the source system and append it into the target table as it is a Delta/Incremental load. There is a possibility toRead More →