Load hive table into spark using Scala

Requirement

Assume you have the hive table named as reports. It is required to process this dataset in spark. Once we have data of hive table in spark data frame we can further transform it as per the business needs. So let’s try to load hive table in spark data frame.


Solution

Please follow the below steps:-

Step 1: Sample table in hive

Let’s create table “reports” in the hive. I am using bdp schema in which I am creating a table.
Enter in to hive CLI and use below commands to create a table.

 
 
  1. CREATE schema bdp;
  2. CREATE TABLE bdp.reports(id INT,days INT,YEAR INT);
  3. INSERT INTO TABLE bdp.reports VALUES (121,232,2015),(122,245,2015),(123,134,2014),(126,67,2016),(182,122,2016),(137,92,2015),(101,311,2015);

Please refer below screen shot.

Step 2: Check table data

Please enter below command to see the records which you inserted.Please refer below screen shot for reference.

 
 
  1. SELECT * FROM bdp.reports;

Please refer below screen shot for reference.

Step 3 : Data Frame Creation

Go to Scala CLI using below command

 
 
  1. spark-shell

Please check whether SQL context with hive support is available or not.
In below screenshot, you can see that at the bottom “Created SQL context (with Hive support).
SQL context available as sqlContext.” Is written.It means that you can use the sqlContext object to interact with the hive.

Now create a data frame hiveReports using below command.

 
 
  1. var hiveReports = sqlContext.sql("select * from bdp.reports")

You have to pass your hive query in it. Whatever data is return by this query, will be available in a data frame.

Step 4: Output

Check whether dataset report is loaded into data frame hiveReport or no using below command

To check schema :-
 
 
  1. hiveReports.printSchema()

To see the data:-
 
 
  1. hiveReports.show()

It will show same output which we got in step 2.
Please refer below screenshot.

You can use this data frame further to join with another dataset, filter or to perform transformation as per needs.

Keep learning.

Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More
/ hive, hive interview, interview-qa

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More

Leave a Reply