Requirement:
You have a dataframe which you want to save into hive table for future use. But you do not want to create the hive table first. Instead you need to save dataframe directly to the hive.
Given:
Sample data:
101, "alex",88.56 102, "john",68.32 103, "peter",75.62 104, "jeff",92.67 105, "mathew",89.56 106, "alan",72.57 107, "steve",96.12 108, "mark",98.45 109, "adam",76.25 109, "david",78.45
Solution:
Note : Skip the step 1 if you already have spark dataframe .
Step 1:Creation of spark dataframe
Go to Spark-shell
Note: I am using spark 2.3 version.
Use below code to create spark dataframe . if you need explanation of below code .Please refer THIS post.
import spark.implicits._ import org.apache.spark.sql.Row import org.apache.spark.sql.types._ var stu_rdd =spark.sparkContext.parallelize(Seq( Row(101, "alex",88.56), Row(102, "john",68.32), Row(103, "peter",75.62), Row(104, "jeff",92.67), Row(105, "mathew",89.56), Row(106, "alan",72.57), Row(107, "steve",96.12), Row(108, "mark",98.45), Row(109, "adam",76.25), Row(109, "david",78.45) )) var schema_list=List(("id","int"),("name","string"),("percentage","double")) var schema=new StructType() schema_list.map(x=> schema=schema.add(x._1,x._2)) var students = spark.createDataFrame(stu_rdd,schema)
Step 2: Saving into Hive
As you have dataframe “students” ,Let’s say table we want to create is “bdp.students_tbl” where bdp is the name of database.
use below code to save it into hive.
students.write.saveAsTable("bdp.students_tbl")
Step 3: Output
Go to hive CLI and use below code to check the hive table
select * from bdp.students_tbl
Wrapping up:
When we need to use dataframe result for the other applications .that time it can be useful.
Don’t forget to subscribe us. Have a great day.
Don’t miss the tutorial on Top Big data courses on Udemy you should Buy