hadoop - Hive：如何显示表的所有分区？

Question

我有一张有 1000 多个分区的表。

" Show partitions" 命令只列出少数分区。

如何显示所有分区？

更新：

我发现“ show partitions”命令只列出了 500 个分区。
" select ... where ..." 只处理 500 个分区！

score 92 · Accepted Answer

显示输出时 CLI 有一些限制。我建议将输出导出到本地文件：

$hive -e 'show partitions table;' > partitions

score 25 · Accepted Answer

25

hive> 显示分区表名；

于 2018-06-27T10:25:03.587 回答

score 4 · Accepted Answer

好的，我通过在上面扩展 wmky 的答案来写这个答案，假设您已经为 Metastore 而不是 derby 配置了 mysql。

select PART_NAME FROM PARTITIONS WHERE TBL_ID=(SELECT TBL_ID FROM TBLS WHERE TBL_NAME='<table_name>');

上面的查询为您提供了分区列的所有可能值。

例子：

hive> desc clicks_fact;
OK
time                    timestamp                                   
..                              
day                     date                                        
file_date               varchar(8)                                  

# Partition Information      
# col_name              data_type               comment             

day                     date                                        
file_date               varchar(8)                                  
Time taken: 1.075 seconds, Fetched: 28 row(s)

我将获取分区列的值。

mysql> select PART_NAME FROM PARTITIONS WHERE TBL_ID=(SELECT TBL_ID FROM TBLS WHERE TBL_NAME='clicks_fact');
+-----------------------------------+
| PART_NAME                         |
+-----------------------------------+
| day=2016-08-16/file_date=20160816 |
| day=2016-08-17/file_date=20160816 |
....
....
| day=2017-09-09/file_date=20170909 |
| day=2017-09-08/file_date=20170909 |
| day=2017-09-09/file_date=20170910 |
| day=2017-09-10/file_date=20170910 |
+-----------------------------------+

1216 rows in set (0.00 sec)

返回所有分区列。

注意：当涉及数据库时（即，当多个数据库具有相同的 table_name 时）JOIN表DBSONDB_ID

score 3 · Accepted Answer

您可以在“PARTITIONS”表中看到 Hive MetaStore 表、分区信息。您可以使用“TBLS”连接“分区”来查询特殊的表分区。

score 0 · Accepted Answer

另一种选择是通过 Thrift 协议与 Hive Metastore 通信。
如果您使用 python 编写代码，您可能会从hmsclient库中受益：

蜂巢cli：

hive> create table test_table_with_partitions(f1 string, f2 int) partitioned by (dt string);
OK
Time taken: 0.127 seconds

hive> alter table test_table_with_partitions add partition(dt=20210504) partition(dt=20210505);
OK
Time taken: 0.152 seconds

Python 命令行：

>>> from hmsclient import hmsclient
>>> client = hmsclient.HMSClient(host='hive.metastore.location', port=9083)
>>> with client as c:
...    all_partitions = c.get_partitions(db_name='default',
...                                      tbl_name='test_table_with_partitions', 
...                                      max_parts=24 * 365 * 3)
...
>>> print([{'dt': part.values[0]} for part in all_partitions])
[{'dt': '20210504'}, {'dt': '20210505'}]

注意：max_parts是一个参数，不能大于 32767（java 短最大值）。

如果您同时安装了 Airflow 和apache.hive额外的，您可以hmsclient轻松创建：

hive_hook = HiveMetastoreHook()
with hive_hook.metastore as hive_client:
    ... your code goes here ...

这似乎是一种与 Hive Metastore 进行通信的更有效的方式，而不是直接访问 DB（以及与数据库引擎无关的 BTW）。

hadoop - Hive：如何显示表的所有分区？

5 回答 5

Related

Reference