join - Impala：将带有 OR 条件的连接查询拆分为两个查询是否合乎逻辑？

Question

我使用Impala执行以下查询结构，它花费了 20 多个小时并且没有完成：

INSERT INTO Final_table
with t1
AS
(SELECT account_id, request_id, status_1
 FROM table_1
 WHERE status_1 = "20"
),
t2 AS
(
 SELECT account_id, request_id, status_2
 FROM table_2
 WHERE status_2 = "10"
)
SELECT t2.account_id, t2.request_id, t2.status_1, t1.status_2
FROM t1
INNER JOIN t2
ON (t1.account_id = t2.account_id OR t1.request_id = t2.request_id);

问题正是在 ON 语句中的“OR”条件下，因为分别从 t1 产生的记录数约为 14M，而从 t2 单独产生的记录数约为 15M。因为我遇到了内存问题，所以我采用了 t1 和 t2 子查询，分别执行它们并将它们保存到新表中。然后根据以下内容执行加入：

CREATE TABLE sub_table_1
AS
 SELECT account_id, request_id, status_1
 FROM table_1
 WHERE status_1 = "20"

CREATE TABLE sub_table_2
AS
 SELECT account_id, request_id, status_2
 FROM table_2
 WHERE status_2 = "10"

INSERT INTO Final_table
SELECT t2.account_id, t2.request_id, t2.status_1, t1.status_2
FROM sub_table_1 AS t1
INNER JOIN sub_table_2 AS t2
ON (t1.account_id = t2.account_id OR t1.request_id = t2.request_id);

子表创建成功，但最终加入仍然面临同样的问题。如果我在两个步骤上执行连接，每个步骤都有一个条件，然后连接两个结果，这是否合乎逻辑？或者是否会有另一种帮助方法？

score 0 · Accepted Answer

您可以使用联合

从第一次加入获取结果（结果）

2.result UNION result2 from 2nd join 条件

SELECT * FROM t1 JOIN t2 ON t1.account_id = t2.account_id UNION SELECT * FROM t1 JOIN t2 ON t1.request_id = t2.request_id

join - Impala：将带有 OR 条件的连接查询拆分为两个查询是否合乎逻辑？

1 回答 1

Related

Reference