7

如何在 Hive 中进行子选择?我想我可能犯了一个对我来说并不那么明显的非常明显的错误......

我收到的错误:FAILED: Parse Error: line 4:8 cannot recognize input 'SELECT' in expression specification

这是我的三个源表:

aaa_hit -> [SESSION_KEY, HIT_KEY, URL]
aaa_event-> [SESSION_KEY,HIT_KEY,EVENT_ID]
aaa_session->[SESSION_KEY,REMOTE_ADDRESS]

...我想要做的是将结果插入到这样的结果表中:

result -> [url, num_url, event_id, num_event_id, remote_address, num_remote_address]

...其中第 1 列是 URL,第 3 列是每个 URL 的前 1 个“事件”,第 5 列是访问该 URL 的前 1 个 REMOTE_ADDRESS。(甚至列是前一列的“计数”。)

Sooooo ...我在这里做错了什么?

INSERT OVERWRITE TABLE result2
SELECT url, 
       COUNT(url) AS access_url, 
       (SELECT events.event_id as evt, 
               COUNT(events.event_id) as access_evt
        FROM   aaa_event events 
               LEFT OUTER JOIN aaa_hit hits 
                 ON ( events.hit_key = hit_key )
                 ORDER BY access_evt DESC LIMIT 1), 
       (SELECT sessions.remote_address as remote_address, 
               COUNT(sessions.remote_address) as access_addr
        FROM   aaa_session sessions 
               RIGHT OUTER JOIN aaa_hit hits 
                 ON ( sessions.session_key = session_key )
                 ORDER BY access_addr DESC LIMIT 1) 
FROM   aaa_hit
ORDER  BY access_url DESC;

太感谢了 :)

4

2 回答 2

10

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries

Hive 仅在 FROM 子句中支持子查询。

您不能将子查询用作 Hive 中的“列”。

要解决这个问题,您需要在 FROM 子句中使用该子查询JOIN。(以下不起作用,但这是想法)

SELECT url, 
       COUNT(url) AS access_url, 
       t2.col1, t2.col2 ...
FROM   aaa_hit
JOIN (SELECT events.event_id as evt, 
               COUNT(events.event_id) as access_evt
        FROM   aaa_event events 
               LEFT OUTER JOIN aaa_hit hits 
                 ON ( events.hit_key = hit_key )
                 ORDER BY access_evt DESC LIMIT 1), 
       (SELECT sessions.remote_address as remote_address, 
               COUNT(sessions.remote_address) as access_addr
        FROM   aaa_session sessions 
               RIGHT OUTER JOIN aaa_hit hits 
                 ON ( sessions.session_key = session_key )
                 ORDER BY access_addr DESC LIMIT 1) t2
ON (aaa_hit.THING = t2.THING)

查看https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins了解有关在 Hive 中使用 JOIN 的更多信息。

于 2011-06-17T22:16:17.270 回答
0

您没有 GroupBy 操作,Count 是一个聚合。只有 count(*) 在没有 GroupBy 子句的情况下有效。

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy

于 2018-02-19T11:01:18.573 回答