6

我有一个表,它是我的 SQL Server 2012 数据库中对象的链接表 ( annonsid, annonsid2)。该表用于创建三角形链甚至矩形链,以查看谁可以与谁交换。

这是我在Matching_IDs有 150 万行的表上使用的查询,使用此查询产生 1400 万条可能的链:

SELECT COUNT(*)
FROM Matching_IDs AS m
  INNER JOIN Matching_IDs AS m2
     ON m.annonsid2 = m2.annonsid
  INNER JOIN Matching_IDs AS m3
     ON m2.annonsid2 = m3.annonsid
       AND m.annonsid = m3.annonsid2

我必须提高性能以花费 1 秒或更短的时间,有没有更快的方法来做到这一点?在我的电脑上查询大约需要 1 分钟。我通常使用 a WHERE m.annonsid=x,但它需要相同的时间,因为无论如何它都必须经过所有可能的组合。

更新:最新查询计划

|--Compute Scalar(DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[globalagg1011],0)))
   |--Stream Aggregate(DEFINE:([globalagg1011]=SUM([partialagg1010])))
        |--Parallelism(Gather Streams)
             |--Stream Aggregate(DEFINE:([partialagg1010]=Count(*)))
                  |--Hash Match(Inner Join, HASH:([m2].[annonsid2], [m2].[annonsid])=([m3].[annonsid], [m].[annonsid2]), RESIDUAL:([MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m2].[annonsid2]=[MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m3].[annonsid] AND [MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m].[annonsid2]=[MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m2].[annonsid]))
                       |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m2].[annonsid2], [m2].[annonsid]))
                       |    |--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133207] AS [m2]))
                       |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m3].[annonsid], [m].[annonsid2]))
                            |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([m].[annonsid])=([m3].[annonsid2]), RESIDUAL:([MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m].[annonsid]=[MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m3].[annonsid2]))
                                 |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m].[annonsid]), ORDER BY:([m].[annonsid] ASC))
                                 |    |--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133152] AS [m]), ORDERED FORWARD)
                                 |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m3].[annonsid2]), ORDER BY:([m3].[annonsid2] ASC))
                                      |--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133207] AS [m3]), ORDERED FORWARD)
4

5 回答 5

3

一些想法:

尝试两个索引 (annonsid,annonsid2) 和 (annonsid2,annonsid)

您是否尝试过列存储索引?它使表只读,但它可能会提高性能。

此外,查询的一些变体可能会有所帮助。例子:

SELECT COUNT(*)
FROM Matching_IDs AS m
  INNER JOIN Matching_IDs AS m2
     ON m.annonsid2 = m2.annonsid
  INNER JOIN Matching_IDs AS m3
     ON m2.annonsid2 = m3.annonsid
where m.annonsid = m3.annonsid2

或者

SELECT COUNT(*)
FROM Matching_IDs AS m, Matching_IDs AS m2, Matching_IDs AS m3
where m2.annonsid2 = m3.annonsid
  and m.annonsid2 = m2.annonsid
  and m.annonsid = m3.annonsid2

你检查 CPU/IO-Load 了吗?如果 IO-Load 很高,那么服务器不是在处理数字,而是交换 => 更多 RAM 可以解决问题。

这个查询有多快?

SELECT COUNT(*)
FROM Matching_IDs AS m
  INNER JOIN Matching_IDs AS m2
     ON m.annonsid2 = m2.annonsid

如果这非常快,但添加下一个连接会减慢速度,那么您可能需要更多 RAM。

于 2012-12-30T09:47:36.920 回答
1

It seems like you already indexed this quite well. You can try converting the hash to a merge join by adding the right multi-column index, but it won't give you the desired speedup of 60x.

I think this index would be on annonsid, annonsid2 although I might have made a mistake here.

It would be nice to materialize all of this but indexed views do not support self-joins. You can try to materialize this query (unaggregated) into a new table. Whenever you execute DML against the base table, also update the second table (using either application logic or triggers). That would allow you to query blazingly fast.

于 2012-12-29T23:54:20.257 回答
1

您应该使这个查询更加分离。我认为首先您应该创建一个表,您可以在其中存储主键 + annonsid,annonsid2 -如果 annosid 本身不是主键

DECLARE @AnnonsIds TABLE
(
primaryKey int,
-- if you need later more info from the original rows like captions  
-- AND it is not (just) the annonsid
annonsid int,
annonsid2 int
)

如果你声明一个表,并且你在这个列上有索引,那么通过WHERE annonsid = @annonsid OR annonsid2 = @annosid

在第一步之后,你有一个小得多(我猜)和“薄”的桌子可以使用。然后你只需要在这里使用连接或者在上面创建一个临时表和一个 CTE。

我认为它应该更快,这取决于你的条件的选择性,WHERE如果有 110 万行适合它,那么它没有意义,但如果只有几百或 tousend,那么你应该试一试!

于 2012-12-30T07:36:32.730 回答
0

1 - 将选择更改Count(*)Count(1)Count(id)

set Nocount on 2 -在存储过程的第一个或查询的第一个写入

3 - 使用索引annonsidannonsid2

4 - 在你的表中的主键之后有你的索引

于 2012-12-31T07:42:34.513 回答
0

RelatedIds您可以通过添加带有AnnonsIdRelatedAnnonId的表来非规范化数据Distance。对于AnnonsId表的每个值都将包含每个行RelatedAnnonId以及需要遍历才能到达它的关系数,即Distance. 现有MatchingIds表上的触发器将使用某些已配置的最大值来维护新表Distance,例如 3 以处理矩形份额。AnnonsId在 ( , Distance)上索引表。

编辑:Distance ( , )上的索引AnnonsId将允许您快速找到具有足够相关条目以形成特定形状的行。MaxDistance如果您希望能够排除基于(例如,具有三角形但矩形关系的行)的列,则添加列可能很有用。

新查询将inner join RelatedIds as RI on RI.AnnonsId = m.AnnonsId and RI.Distance <= @MaxDistance使用所需的“形状”来指示@MaxDistance.

它应该在select. 缺点是另一个表在更改表时具有大量行和触发器开销MatchingIds

示例: 中有两个条目Matching_IDs:(1,2) 和 (2,3)。新表将包含 3 个条目:
1-> 2:距离 = 1
1-> 3:距离 = 2(从 1 到 3 需要一个中间“节点”)
2-> 3:距离 = 1

向匹配的 id (3,1) 添加一个条目将导致另一个条目:
1-> 1: distance = 3

瞧:你找到了一个三角形(距离=3)。

现在,要查找所有三角形,只需执行以下操作:

select * 
  from RelatedIds 
 where AnnonsId=RelatedAnnonId 
   and Distance=3
于 2012-12-30T19:46:22.360 回答