14

我在 hive 中看到的用于计算统计数据的语法似乎表明标题问题的答案是“否”:

ANALYZE TABLE [TABLENAME] PARTITION(parcol1=…, partcol2=….) COMPUTE STATISTICS

然而,我想把它扔在这里,因为令人惊讶的是它总是需要编写一个脚本来迭代分区以生成每个分区的语句。我们现在在这个小表上有大约一千个分区,并且它将按数量级增长。

顺便说一句,我在没有指定分区的情况下尝试了以下操作:

hive> analyze table metrics compute statistics;
FAILED: SemanticException [Error 10115]: Table is partitioned and partition specification is needed
4

3 回答 3

11
于 2014-11-12T14:12:36.127 回答
5

我在最新的 Hive 1.2 上,下面的命令工作得很好

hive> analyze table member partition(day) compute statistics noscan;
Partition mobi_mysql.member{day=20150831} stats: [numFiles=7, numRows=-1, totalSize=4735943322, rawDataSize=-1]
Partition mobi_mysql.member{day=20150901} stats: [numFiles=7, numRows=117512, totalSize=19741804, rawDataSize=0]
Partition mobi_mysql.member{day=20150902} stats: [numFiles=7, numRows=-1, totalSize=17734601, rawDataSize=-1]
Partition mobi_mysql.member{day=20150903} stats: [numFiles=7, numRows=-1, totalSize=13091084, rawDataSize=-1]
OK
Time taken: 2.089 seconds
于 2015-09-10T06:53:24.830 回答
0

根据 Hive 手册,如果您未指定分区规格统计信息,则会为整个表收集, https://cwiki.apache.org/confluence/display/Hive/StatsDev

When the user issues that command, he may or may not specify the partition specs. If the user doesn't specify any partition specs, statistics are gathered for the table as well as all the partitions (if any).
于 2013-11-15T22:03:54.003 回答