不是关于如何使用 Impala 的真正答案,但 Hadoop 上的其他 SQL 解决方案已经做了分析和子查询选项。如果没有这些功能,您可能将不得不依赖多步骤流程或一些 UDAF。
我是 InfiniDB 的架构师
InfiniDB 支持分析函数和子查询。
http://infinidb.co
查看 Radiant Advisors 基准测试中的查询 8,它是您所追求的类似风格的查询,利用排名分析功能。Presto 也能够以较慢(80 倍)的速度运行这种样式查询
http://radiantadvisors.com/wp-content/uploads/2014/04/RadiantAdvisors_Benchmark_SQL-on-Hadoop_2014Q1.pdf
来自基准的查询(查询 8)
SELECT
sub.visit_entry_idaction_url,
sub.name,
lv.referer_url,
sum(visit_ total_time) total_time,
count(sub.idvisit),
RANK () OVER (PARTITION BY sub. visit_entry_idaction_url
ORDER BY
count(sub.idvisit)) rank_by_visits,
DENSE_RANK() OVER (PARTITION BY sub.visit_entry_idaction_url
ORDER BY
count(visit_total_time)) rank_by_ time_spent
FROM
log_visit lv,
(
SELECT
visit_entry_idaction_url,
name,
idvisit
FROM
log_visit JOIN log_ action
ON
(visit_entry_idaction_url = log_action.idaction)
WHERE
visit_ entry_idaction_url between 2301400 AND
2302400) sub
WHERE
lv.idvisit = sub.idvisit
GROUP BY
1, 2, 3
ORDER BY
1, 6, 7;
结果
Hive 0.12 Not Executable
Presto 0.57 506.84s
InfiniDB 4.0 6.37s
Impala 1.2 Not Executable