-1

给定两个表:

filtered_locations包含一小组数据(只有几行)

|-------------|
| loc<String> | 
|-------------|
|     ...     |
|-------------|   

table_clients非常大的表(数百万行)

 |--------------------------------------------|
 | id  | name|  age |locations <array<String> | 
 |-----|--------------------------------------|
 |     |     |      | [a,b,c..]               |
 |--------------------------------------------|

我想查询表table_clients上的值filtered_locations。主要问题是要查询的字段table_clients是一种array类型。

因此,我分解了该列,然后尝试嵌入一个子查询以仅包含 filtered_locations.

我面临的第一个问题是 Hive(至少我正在运行的版本)似乎不接受inorexists语句中的子查询。

这就是我得到的错误:

编译语句时出错:FAILED: SemanticException Invalid column reference 'location' in definition of SubQuery sq_1 [ tc.location in (select fl.loc from filtered_locations fl)] 用作 sq_1

作为替代方案,我尝试使用 a但由于调用 Second 错误LEFT JOIN也不起作用explode

编译语句时出错:FAILED:SemanticException [错误 10085]:不支持带有 LATERAL VIEW 的 JOIN 'location'

with filtered_locations as (
  SELECT
    'loc1' as loc
    union all
    'loc2' as loc
)

select 
  id, name, location
  max(age) as max_age 
from
  table_clients tc
  LATERAL VIEW EXPLODE(locations) l as location
-- Ideally this should work!
-- where
--  tc.location in (
--     select fl.loc from filtered_locations fl
--  )
left join filtered_locations fl
on fl.loc = tc.location

group by id, name, location

那么我的问题的最佳解决方案是什么?请注意,table_clients有数百万条记录!

谢谢

4

1 回答 1

2

从理论上讲,这应该有效

select  *

from    table_clients c
        lateral view explode(location) e as loc

where   e.loc in (select l.loc from filtered_locations l)
;

失败:SemanticException [错误 10009]:第 6:8 行无效的表别名“e”

...但由于它没有,因此需要一些解决方法

select  *

from   (select  *

        from    table_clients c
                lateral view explode(location) e as loc
        ) c        

where   c.loc in (select l.loc from filtered_locations l)
;
于 2017-09-12T08:29:28.217 回答