DataIngestion

Requirement In this post, we will learn how to convert a table’s schema into a Data Frame in Spark. Sample Data empno ename designation manager hire_date sal deptno location 9369 SMITH CLERK 7902 12/17/1980 800 20 BANGALORE 9499 ALLEN SALESMAN 7698 2/20/1981 1600 30 HYDERABAD 9521 WARD SALESMAN 7698 2/22/1981Read More →

Requirement In this post, we will learn how to handle NULL in spark dataframe. There are multiple ways to handle NULL while data processing. We will see how can we do it in Spark DataFrame. Solution Create Dataframe with dummy data val df = spark.createDataFrame(Seq( (1100, “Person1”, “Location1”, null), (1200,Read More →

Requirement In this post, we will learn how to select a specific column value or all the columns in Spark DataFrame with different approaches. Sample Data empno ename designation manager hire_date sal deptno location 9369 SMITH CLERK 7902 12/17/1980 800 20 BANGALORE 9499 ALLEN SALESMAN 7698 2/20/1981 1600 30 HYDERABADRead More →

Requirement The UDF is a user-defined function. As its name indicate, a user can create a custom function and used it wherever required. We do create UDF when the existing build-in functions not available or not able to fulfill the requirement. Sample Data empno ename designation manager hire_date sal deptnoRead More →

Requirement In this post, we are going to import data from RDBMS to Hadoop. Here, we have MySQL as an RDBMS database. We will use Sqoop to import data from RDBMS to Hadoop. Components Involved MySQL – For source data HDFS – To store source data in Hadoop Sqoop –Read More →