Load spark dataframe into non existing hive table

Requirement:

You have a dataframe which you want to save into hive table for future use. But you do not want to create the hive table first. Instead you need to save dataframe directly to the hive.

Given:

Sample data:

 
 
  1. 101, "alex",88.56
  2. 102, "john",68.32
  3. 103, "peter",75.62
  4. 104, "jeff",92.67
  5. 105, "mathew",89.56
  6. 106, "alan",72.57
  7. 107, "steve",96.12
  8. 108, "mark",98.45
  9. 109, "adam",76.25
  10. 109, "david",78.45

Solution:

Note : Skip the step 1 if you already have spark dataframe .

Step 1:Creation of spark dataframe

Go to Spark-shell

Note: I am using spark 2.3 version.

Use  below code to create spark dataframe . if you need explanation of below code .Please refer THIS post.

 

 
 
  1. import spark.implicits._
  2. import org.apache.spark.sql.Row
  3. import org.apache.spark.sql.types._
  4. var stu_rdd =spark.sparkContext.parallelize(Seq(
  5. Row(101, "alex",88.56),
  6. Row(102, "john",68.32),
  7. Row(103, "peter",75.62),
  8. Row(104, "jeff",92.67),
  9. Row(105, "mathew",89.56),
  10. Row(106, "alan",72.57),
  11. Row(107, "steve",96.12),
  12. Row(108, "mark",98.45),
  13. Row(109, "adam",76.25),
  14. Row(109, "david",78.45)
  15. ))
  16. var schema_list=List(("id","int"),("name","string"),("percentage","double"))
  17. var schema=new StructType()
  18. schema_list.map(x=> schema=schema.add(x._1,x._2))
  19. var students = spark.createDataFrame(stu_rdd,schema)

Step 2: Saving into Hive

As you have dataframe “students” ,Let’s say table we want to create is “bdp.students_tbl” where bdp is the name of database.

use below code to save it into hive.

 
 
  1. students.write.saveAsTable("bdp.students_tbl")

Step 3: Output

Go to hive CLI and use below code to check the hive table

 
 
  1. SELECT * FROM bdp.students_tbl

Wrapping up:

When we need to use dataframe result for the other applications .that time it can be useful.

Don’t forget to subscribe us. Have a great day.

Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More
/ hive, hive interview, interview-qa

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More

Leave a Reply