Hive Most Asked Interview Questions With Answers

What is Hive and why it is useful?

Hive is a data warehouse application where data gets stored in the structure format. It is used to querying and managing large datasets. It provides a SQL-like interface to access the data which is also called HiveQL(HQL).

What is the advantage of the Hive?

Advantage:

Hive is a distributed store.
Provides a variety of data storage(Text, Sequence, Parquet, ORC)
It also enabled easy ETL process (Extract, Transform, Process.

What are the execution engines of Hive?

There are two hive execution engine – MapReduce and Tez.

Hive on Tez execution takes advantage of Directed Acyclic Graph (DAG) execution representing the query instead of multiple stages of Map Reduce program which involve a lot of synchronization barriers and I/O overheads. This all improved in Tez engine by writing intermediate data set into memory instead of hard disk.

What are the different files format supported in Hive?

The file formats are supported by Hive are:

Text
Sequence
ORC
Parquet
Avro

What is metastore in Hive?

Metastore in Hive is a central repository where the information related to database, tables, and relations get stored. This information is called Metadata.

What is the different type of tables available in Hive?

There are two types of the table in the hive:

Managed Table (Internal Table)
External Table

Where to use the external and managed table in Hive?

Managed Table: In this table, the Hive controls the lifecycle of their data. This is the default table of Hive. When you load data into a managed table, it gets stored in a subdirectory under the warehouse directory which is defined as a value of the properties hive.metastore.warehouse.dir in the configuration file named hive-site.xml located at /user/hive/warehouse.

External Table: In this table, Hive does not assume its own data. The data are available at an external location.

What is SerDe in Hive and what is the use of it?

SerDe is basically a library in the Hive. It is used to Serialize and Deserialize the data. It allows the user to read data from the table and write into HDFS location in any custom format.

The hive having below built-in SerDe:

JsonSerDe
CSV
Parquet and ORC
Thrift etc

What is partition in Hive? What is the importance of it?

You can check the below post:

https://bigdataprogrammers.com/partition-in-hive/

What are the different type of partitions in the hive?

There are two types of partitions in the Hive: Static and Dynamic.

Static: We use it when we know the partition value.

Dynamic: We use it when we do not know the partition value.

Link has more details about static and dynamic partition.

Static and Dynamic Partitions

Check Hive Most Asked Interview Questions With Answers – Part II

Subscribe us for getting the update on the new post.

Hive Most Asked Interview Questions With Answers – Part I

Leave a Reply Cancel reply

Hive Most Asked Interview Questions With Answers – Part I

Spark Interview Questions Part-1

Hive Most Asked Interview Questions With Answers – Part II

Hive Scenario Based Interview Questions with Answers

Scenario based interview questions on Big Data

Spark Scenario based Interview Questions

Spark Interview Questions – Part 2

Spark Scenario based Interview Questions with Answers – 2

Kafka Interview Questions

Top 35 data engineer interview questions and answers – All in one

Big Data Engineering Interview Questions

Certifications

Top Machine Learning Courses You Shouldn’t Miss

Top courses for data engineers

Top Big Data Courses on Udemy You should Take

Hive Most Asked Interview Questions With Answers – Part I

Leave a Reply Cancel reply

Tags