Hive Most Asked Interview Questions With Answers – Part I

  1. What is Hive and why it is useful?

Hive is a data warehouse application where data gets stored in the structure format. It is used to querying and managing large datasets. It provides a SQL-like interface to access the data which is also called HiveQL(HQL).

  1. What is the advantage of the Hive?

Advantage:

  • Hive is a distributed store.
  • Provides a variety of data storage(Text, Sequence, Parquet, ORC)
  • It also enabled easy ETL process (Extract, Transform, Process.
  1. What are the execution engines of Hive?

There are two hive execution engine – MapReduce and Tez.

Hive on Tez execution takes advantage of Directed Acyclic Graph (DAG) execution representing the query instead of multiple stages of Map Reduce program which involve a lot of synchronization barriers and I/O overheads. This all improved in Tez engine by writing intermediate data set into memory instead of hard disk.

  1. What are the different files format supported in Hive?

The file formats are supported by Hive are:

  • Text
  • Sequence
  • ORC
  • Parquet
  • Avro
  1. What is metastore in Hive?

Metastore in Hive is a central repository where the information related to database, tables, and relations get stored. This information is called Metadata.

  1. What is the different type of tables available in Hive?

There are two types of the table in the hive:

  • Managed Table (Internal Table)
  • External Table
  1. Where to use the external and managed table in Hive?

Managed Table: In this table, the Hive controls the lifecycle of their data. This is the default table of Hive. When you load data into a managed table, it gets stored in a subdirectory under the warehouse directory which is defined as a value of the properties hive.metastore.warehouse.dir in the configuration file named hive-site.xml located at /user/hive/warehouse.

External Table: In this table, Hive does not assume its own data. The data are available at an external location.

  1. What is SerDe in Hive and what is the use of it?

SerDe is basically a library in the Hive. It is used to Serialize and Deserialize the data. It allows the user to read data from the table and write into HDFS location in any custom format.

The hive having below built-in SerDe:

  • JsonSerDe
  • CSV
  • Parquet and ORC
  • Thrift etc
  1. What is partition in Hive? What is the importance of it?

You can check the below post:

http://bigdataprogrammers.com/partition-in-hive/

  1. What are the different type of partitions in the hive?

There are two types of partitions in the Hive: Static and Dynamic.

Static: We use it when we know the partition value.

Dynamic: We use it when we do not know the partition value.

Link has more details about static and dynamic partition.

Static and Dynamic Partitions

 

Check Hive Most Asked Interview Questions With Answers – Part II

Subscribe us for getting the update on the new post.

Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More
/ hive, hive interview, interview-qa

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More

Leave a Reply