0

我正在尝试使用非 equi 连接条件在两个表之间进行左外连接,而 hive 不支持它。在 where 子句中添加条件会导致数据丢失。请让我知道是否有人有解决方案。以下是示例代码片段

Select B.dt ,D.field, sum(B.qty)
from A INNER join B ON A.dt= B.dt
INNER Join C ON B.nbr=C.nbr
LEFT OUTER JOIN D ON A.nbr2=D.Nbr2
AND B.nbr=D.nbr
---Below non equi join not supported
AND B.dt between C.start_date and C.End_Date 
-- Need suggestion of this non equi join.

以下是 hive 中非 equi 连接的错误:FAILED: SemanticException [Error 10017]: Line 9:4 JOIN 'START_DATE' 中遇到的左右别名

4

1 回答 1

0

在您的情况下,有一种方法可以做到这一点。这是一种union all/window 函数方法。我认为这可以满足您的要求:

with t as (
      select a.nbr2, b.nbr, b.dt, null as end_date, null as field, b.qty
      from A join
           B
           on A.dt = B.dt
      union all
      select d.nrb2, d.nbr, d.start_date, d.end_date, d.field, null
      from D
    )
select dt, (case when dt < d_end_date then d_field end), sum(qty)
from (select t.*, 
             last_value(field, true) over (partition by nbr, nbr2 order by dt) as d_field,
             last_value(end_date, true) over (partition by nbr, nbr2 order by dt) as d_end_date
      from t
     ) t
group by dt, dt, (case when dt < d_end_date then d_field end);

我不是 100% 确定这完全一样——例如,这假设 D 中最多有一个匹配记录并且没有重叠。但想法是交错值并使用窗口函数last_value()ignore nulls 选项来获取正确的值。

于 2020-06-11T10:51:18.137 回答