0

我有一个从 SQL Server 2012 数据库运行的非常大的网络论坛应用程序(自 2001 年以来大约有 2000 万个帖子)。数据文件大小约为 40GB。

我在表格中添加了适当字段的索引,但是这个查询(显示每个论坛中帖子的日期范围)大约需要 40 分钟才能运行:

SELECT
    T2.ForumId,
    Forums.Title,
    T2.ForumThreads,
    T2.ForumPosts,
    T2.ForumStart,
    T2.ForumStop

FROM
    Forums
    INNER JOIN (

    SELECT
        Min(ThreadStart) As ForumStart,
        Max(ThreadStop) As ForumStop,
        Count(*) As ForumThreads,
        Sum(ThreadPosts) As ForumPosts,
        Threads.ForumId
    FROM
        Threads
        INNER JOIN (

            SELECT
                Min(Posts.DateTime) As ThreadStart,
                Max(Posts.DateTime) As ThreadStop,
                Count(*) As ThreadPosts,
                Posts.ThreadId
            FROM
                Posts
            GROUP BY
                Posts.ThreadId

        ) As P2 ON Threads.ThreadId = P2.ThreadId

    GROUP BY
        Threads.ForumId

) AS T2 ON T2.ForumId = Forums.ForumId

我怎样才能加快速度?

更新:

这是估计的执行计划,从右到左:

[Path 1]

Clustered Index Scan (Clustered) [Posts].[PK_Posts], Cost: 98%
Hash Match (Partial Aggregate), Cost: 2%
Parallelism (Repartition Streams), Cost: 0%
Hash Match (Aggregate), Cost 0%
Compute Scalar, Cost: 0%
Bitmap (Bitmap Create), Cost: 0%

[Path 2]

Index Scan (NonClustered) [Threads].[IX_ForumId], Cost: 0%
Parallelism (Repartition Streams), Cost: 0%

[Path 1 and 2 converge into Path 3]

Hash Match (Inner Join), Cost: 0%
Hash Match (Partial Agregate), Cost: 0%
Parallelism (Repartition Streams), Cost: 0%
Sort, Cost: 0%
Stream Aggregate (Aggregate), Cost: 0%
Compute Scalar, Cost: 0%

[Path 4]
Clustered Index Seek (Clustered) [Forums].[PK_Forums], Cost: 0%

[Path 3 and 4 converge into Path 5]

Nested Loops (Inner Join), Cost: 0%
Paralleism (Gather Streams), Cost: 0%
SELECT, Cost: 0%
4

6 回答 6

1

您是否尝试过将这 2 个派生表放入 #temp 表中?SQL Server 将从它们获取统计信息(单列),您也可以在它们上放置索引。

此外,乍一看索引视图可能会有所帮助,因为您有很多聚合。

于 2012-06-07T22:02:09.603 回答
1

这样的事情怎么样?无论如何,你明白了......

SELECT f.ForumID,
f.Title,
MIN(p.[DateTime]) as ForumStart,
MAX(p.[DateTime]) as ForumStop,
COUNT(DISTINCT f.ForumID) as ForumPosts,
COUNT(DISTINCT t.ThreadID) as ForumThreads
FROM Forums f
INNER JOIN Threads t
ON f.ForumID = t.ForumID
INNER JOIN Posts p
ON p.ThreadID = p.ThreadID
GROUP BY f.ForumID, f.Title
于 2012-06-07T22:17:20.357 回答
1

当您这样做时,索引可能会起作用SELECT FROM,但子查询的结果不会被索引。加入他们可能会扼杀表演。

正如巴克利建议的那样,我会尝试将中间结果存储在 #temp 表中,并在执行最终查询之前添加索引。

但是外部SELECT不包括线程特定的信息。看起来查询只是按论坛选择最小/最大日期。如果是这样,您可以获得按论坛分组的最小/最大/计数帖子。

于 2012-06-07T22:23:01.100 回答
0

你真的需要聚合两次吗?这个查询会给你同样的结果吗?

SELECT 
T2.ForumId, 
Forums.Title, 
T2.ForumThreads, 
T2.ForumPosts, 
T2.ForumStart, 
T2.ForumStop  
FROM 
    Forums 
INNER JOIN (  
    SELECT
         Min(ThreadStart) As ForumStart,
         Max(ThreadStop) As ForumStop,     
         Count(*) As ForumThreads,     
         Sum(ThreadPosts) As ForumPosts,     
         Threads.ForumId 
    FROM     
        Threads     
    INNER JOIN (          
                SELECT             
                    Posts.DateTime As ThreadStart,             
                    Posts.DateTime As ThreadStop,             
                    Count(*) As ThreadPosts,             
                    Posts.ThreadId         
                FROM             
                    Posts         
                 ) As P2 ON Threads.ThreadId = P2.ThreadId  
    GROUP BY     
        Threads.ForumId  
    ) AS T2 ON T2.ForumId = Forums.ForumId 
于 2012-06-07T22:09:02.980 回答
0

如果您通过将 ForumId 添加到 Posts 表来进行非规范化,您将能够直接从 Posts 表中查询所有统计信息。使用正确的索引,这可能会表现得很好。当然,这将需要对您的代码进行一些小改动,以便在插入 Posts 表时包含 ForumId ......

于 2012-06-08T21:18:40.700 回答
0

我在数据库中添加了更多索引,它大大加快了速度。执行时间现在约为 20 秒 (!!)。我承认很多添加的索引都是猜测(或者只是随机添加)。

于 2012-08-25T13:14:01.517 回答