使用版本 0.11.0。尝试执行此查询时得到不正确的结果
select t1.symbol, max(t1.maxts - t1.orderts) as diff from
(select catid, symbol, max(cast(timestamp as double)*1000) as maxts, min(cast(timestamp as double)*1000) as orderts, count(*) as cnt
from cat where recordtype in (0,1) and customerid=srcrepid group by symbol, catid) t1
where t1.cnt > 1
group by t1.symbol;
如您所见,有一个带有 group by 语句的子查询。此子查询计算每个 MYID 和 SYMBOL 的时间戳值的最大值和最小值。
现在,我有 24 个符号。在外部查询中,我想找到每个 SYMBOL 的最大差异,所以我按 SYMBOL 分组。
问题是这会立即返回 864 个结果行。Hive 似乎未能将最后的结果减少到我期望看到的结果。
这是一个错误吗?任何人都可以重现这个吗?我有 6 个节点运行,每个节点有 4 个符号。
使用的表:
create table cat(CATID bigint, CUSTOMERID int, FILLPRICE double, FILLSIZE int, INSTRUMENTTYPE int, ORDERACTION int, ORDERSTATUS int, ORDERTYPE int, ORDID string, PRICE double, RECORDTYPE int, SIZE int, SRCORDID string, SRCREPID int, TIMESTAMP timestamp) PARTITIONED BY (SYMBOL string, REPID int) row format delimited fields terminated by ',' stored as ORC;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1000;
hive.exec.max.dynamic.partitions.pernode=1000;
已编辑:已编辑,因为查询与使用的实际表不一致,因此很难提供任何帮助...