0

我正在运行代码如下。我遇到了长时间运行的问题。有没有办法让它跑得更快?

SELECT
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
, sum(b.column1) as alien

FROM table1 a

INNER JOIN table1 b

ON a.data_date = b.data_date AND a.column3 = b.column3

WHERE a.data_date ='20131001'
and a.column3 = 12345
and a.column4 is not NULL
and b.column4 is NULL

GROUP BY
a.data_date
4

3 回答 3

1

据我所知,你根本不需要JOIN
您可以通过对表格的单一引用获得相同的结果。

于 2013-10-04T22:50:10.420 回答
0

由于这是同一张表,我相信您可以删除您的加入,最好提供您的示例数据和预期结果,然后我们可以更好地帮助您,加油 =)

SELECT
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
--remove this
--, sum(b.column1) as alien

FROM table1 a

--remove this
--INNER JOIN table1 b

--ON a.data_date = b.data_date AND a.column3 = b.column3

WHERE a.data_date ='20131001'
and a.column3 = 12345


and a.column4 is not NULL
--remove this
--and b.column4 is NULL

GROUP BY
a.data_date,a.column3
于 2013-10-05T00:02:22.793 回答
0

优化技术还取决于表的大小。

小表应该是第一个并尝试将该表放在分布式缓存中。

为了让它更快,而不是在join之后应用where条件,尝试在join之前应用它,这样你的join会更快。

你可以试试下面的东西

set hive.auto.convert.join.true;
select
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
, sum(b.column1) as alien
from table1 b
inner join (select * from table1 WHERE a.data_date ='20131001'
and a.column3 = 12345
and a.column4 is not NULL
)a
on (a.data_date = b.data_date AND a.column3 = b.column3)

where b.column4 is NULL
GROUP BY
a.data_date
于 2013-10-06T12:03:45.103 回答