Hive Scenario Based Interview Questions with Answers

1. Let’s say a Hive table is created as an external table. If we drop the table, will the data be accessible?

Answer: The data will be accessible even if the table gets dropped. We can get the data from the table’s HDFS location.

2. A Hive table is created as an external table at location say hdfs://usr/data/table_name. If we dump a data set which are having the data as per the table structure, will we able to fetch the records from the table using a select query?

Answer: Yes, we will be able to fetch the records from the table after dumping the data set at the hive table external location.

3. A Hive partition table is created which is partition by a column say yearofexperience. If we create a directory say yearofexperience=3 at the HDFS path of the table and dump the data set which is as per the table structure. Will the data be available if we execute select query on the table?

Answer: No, the data will not accessible by executing the select query on the table. After dumping the data files at table HDFS location for the partition, you will have to update the metadata using below command:

 ALTER TABLE partitioned_test_managed ADD PARTITION (yearofexperience=3) LOCATION 'hdfs://sandbox-hdp.hortonworks.com:8020/apps/hive/warehouse/bdp.db/partitioned_test_managed/yearofexperience=3';

4. Let’s take the same previous Hive partition table. If we drop the partition, will we able to access the data?

Answer: If a hive partition created as a managed table, then after dropping the partition, data will also get removed from the path. But in case of an external table, data will be accessible from the same external path of the hive partition table.

5. Let’s take the same previous Hive partition table partitioned by a column named yearofexperience. It is having multiple partitions at the HDFS location. If we drop a partition directory say yearofexperience=3  from the HDFS location, will this partition be listed while querying show partitions on the table?

Answer:  If we drop the partition directory say hdfs://sandbox-hdp.hortonworks.com:8020/apps/hive/warehouse/bdp.db/partitioned_test_external2_parquet/yearofexperience=3 from the HDFS location, it will be listed if you query show partitions on the table.

6. Suppose we have created a Hive partition table which is partitioned by a column named city. We are getting data which are having Empty/Null value for the partition column(city) and have to load these data into the hive table with dynamic partition as it is having multiple city records in the data set. In which partition the records, with an empty value for city column, will be available?

Answer: While loading the data into a table using dynamic partition if any null or empty value comes for a defined partition column, then it uses to create a default partition named __HIVE_DEFAULT_PARTITION__ at HDFS location and dump those records in that partition.

For more details with sample data, refer to this hive_default_partition.

7. Let’s say we have created a Hive partition table. This table gets updated every day with a huge volume of data. As we already know that the table is having a high volume of data, we want to restrict the query not to do a full scan on the table. How will you achieve this?

Answer: You can achieve this by setting below properties:

 set hive.mapred.mode=strict;

Now if you execute a query like:

 hive> select * from partitioned_test_external;
FAILED: SemanticException [Error 10041]: No partition predicate found for Alias "partitioned_test_external" Table "partitioned_test_external"
      
hive> select * from partitioned_test_external where yearofexperience=3;
OK
101 A Z Kolkata 9000000001 3
102 B Y Bangalore 9000000002 3
103 C X Mumbai 9000000003 3
104 D W Pune 9000000004 3
Time taken: 8.192 seconds, Fetched: 4 row(s)

8. If we create a table with an EXTERNAL keyword, but not mentioning any location in the create table statement, which kind of table it will be – managed or external?

Answer: The created table will behave like an external table that means if you drop the table, data will be available at the table HDFS location.

9. We have created a view on top of a Hive table. If we drop the Hive table, will the View be accessible?

Answer: The view will not be accessible. It will throw an error like Table not found.

10. Let’s say you want to create a table which is having columns name like Hive keyword (say, timestamp, date, etc.) or column name having space(say “col 50”). How will you create the table in Hive?

Answer: You can mention the column name enclosed by backticks (`).

For example- `timestamp` string, `col 50` string

 

Subscribe us for getting the update on the new post.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply