hadoop - 有什么方法可以使用单个分析命令计算所有分区的配置单元表的统计信息？

Question

我在 hive 中看到的用于计算统计数据的语法似乎表明标题问题的答案是“否”：

ANALYZE TABLE [TABLENAME] PARTITION(parcol1=…, partcol2=….) COMPUTE STATISTICS

然而，我想把它扔在这里，因为令人惊讶的是它总是需要编写一个脚本来迭代分区以生成每个分区的语句。我们现在在这个小表上有大约一千个分区，并且它将按数量级增长。

顺便说一句，我在没有指定分区的情况下尝试了以下操作：

hive> analyze table metrics compute statistics;
FAILED: SemanticException [Error 10115]: Table is partitioned and partition specification is needed

score 11 · Accepted Answer

11

于 2014-11-12T14:12:36.127 回答

score 5 · Accepted Answer

我在最新的 Hive 1.2 上，下面的命令工作得很好

hive> analyze table member partition(day) compute statistics noscan;
Partition mobi_mysql.member{day=20150831} stats: [numFiles=7, numRows=-1, totalSize=4735943322, rawDataSize=-1]
Partition mobi_mysql.member{day=20150901} stats: [numFiles=7, numRows=117512, totalSize=19741804, rawDataSize=0]
Partition mobi_mysql.member{day=20150902} stats: [numFiles=7, numRows=-1, totalSize=17734601, rawDataSize=-1]
Partition mobi_mysql.member{day=20150903} stats: [numFiles=7, numRows=-1, totalSize=13091084, rawDataSize=-1]
OK
Time taken: 2.089 seconds

score 0 · Accepted Answer

根据 Hive 手册，如果您未指定分区规格统计信息，则会为整个表收集， https://cwiki.apache.org/confluence/display/Hive/StatsDev

When the user issues that command, he may or may not specify the partition specs. If the user doesn't specify any partition specs, statistics are gathered for the table as well as all the partitions (if any).

hadoop - 有什么方法可以使用单个分析命令计算所有分区的配置单元表的统计信息？

3 回答 3

Related

Reference