sql - 为什么 INTERSECT 和嵌套 JOIN 一样慢？

Question

我正在使用 MS SQL。

我有一个带有索引的巨大表来使这个查询快速：

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010

它会在不到 1 秒的时间内返回。该表有数十亿行。只有大约 10000 个结果。

我希望这个查询也能在大约一秒钟内完成：

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010'

intersect

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 40652 and
IncrementalStatistics.Created > '12/2/2010'

intersect

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 14403 and
IncrementalStatistics.Created > '12/2/2010'

但这需要20秒。所有单个查询都需要 < 1 秒并返回大约 10k 个结果。

我希望 SQL 在内部将这些子查询中的每一个的结果放入哈希表并进行哈希交集 - 应该是 O(n)。结果集足够大以适合内存，所以我怀疑这是一个 IO 问题。

我写了一个替代查询，它只是一系列嵌套的 JOIN，这也需要大约 20 秒，这是有道理的。

为什么 INTERSECT 这么慢？它是否在查询处理的早期阶段减少为 JOIN？

score 14 · Accepted Answer

试试这个吧。显然未经测试，但我认为它会给你你想要的结果。

select userid 
    from IncrementalStatistics 
    where IncrementalStatisticsTypeID = 5 
        and IncrementalStatistics.AssociatedPlaceID in (47828,40652,14403)  
        and IncrementalStatistics.Created > '12/2/2010'
    group by userid
    having count(distinct IncrementalStatistics.AssociatedPlaceID) = 3

sql - 为什么 INTERSECT 和嵌套 JOIN 一样慢？

1 回答 1

Related

Reference