6

我有一个包含大约 500 个点的表格,并且正在寻找公差范围内的重复项。这需要不到一秒钟的时间,并给了我 500 行。大多数距离为零,因为它给出了相同的点(PointA = PointB)

DECLARE @TOL AS REAL
SET @TOL = 0.05

SELECT 
    PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
   -- AND
   -- PointA.ObjectId <> PointB.ObjectID
ORDER BY ObjectIDa

如果我使用底部附近的注释行,我会得到 14 行,但执行时间会增加到 14 秒。直到我的积分表扩大到十万,这没什么大不了的。

如果答案已经存在,我提前道歉。我确实看过,但是作为新手,我会迷失阅读那些超出我想象的帖子。

附录:ObjectID 是一个 bigint 和表的 PK,所以我意识到我可以将语句更改为

AND PointA.ObjectID > PointB.ObjectID

现在这需要一半的时间并给我一半的结果(7 秒内 7 行)。我现在没有重复(因为第 4 点接近第 8 点,然后第 8 点接近第 4 点)。但是性能仍然让我担心,因为表会非常大,所以任何性能问题都会成为问题。

附录 2:如下更改 JOIN 和 AND(或建议的 WHERE)的顺序也没有区别。

DECLARE @TOL AS REAL
SET @TOL = 0.05

SELECT 
    PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON PointA.ObjectId < PointB.ObjectID
    WHERE
    PointA.Geometry.STDistance(PointB.Geometry) < @TOL
ORDER BY ObjectIDa

我发现我可以将 @Tol 值更改为返回超过 100 行而性能没有变化的大值,即使它需要很多计算,这很有趣。但随后添加一个简单的 A

4

4 回答 4

2

这是一个有趣的问题。

通过将“<>”更改为“>”,您可以获得很大的性能提升并不是不现实的。

正如其他人所提到的,诀窍是充分利用您的索引。当然,通过使用“>”,您应该很容易让服务器限制在您的 PK 的特定范围内——当您已经检查“向前”时,避免“向后”查看。

这种改进将扩展 - 将在您添加行时有所帮助。但是你担心它无助于阻止工作的增加是对的。正如您正确思考的那样,只要您必须扫描更多的行,就会花费更长的时间。这就是这里的情况,因为我们总是想比较一切。

如果第一部分看起来不错,只是 TOL 检查,您是否考虑过完全拆分第二部分?

将第一部分更改为转储到临时表中

SELECT 
    PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST

into #AllDuplicatesWithRepeats

FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON 
    PointA.Geometry.STDistance(PointB.Geometry) < @TOL
ORDER BY ObjectIDa

他们可以在下面编写跳过重复项的直接查询。它并不特别,但针对临时表中的那个小集合,它应该非常快速。

Select
    *
from    
    #AllDuplicatesWithRepeats d1
        left join #AllDuplicatesWithRepeats d2 on (
                        d1.objectIDa = d2.objectIDb
                        and
                        d1.objectIDb = d2.objectIDa
                        )
where
    d2.objectIDb is null
于 2013-12-30T23:27:24.247 回答
2

ObjectID当您添加比较时,执行计划可能正在幕后做一些事情。检查执行计划以查看查询的两个不同版本是否是,例如,使用索引搜索与表扫描。如果是这样,请考虑尝试查询提示

作为一种解决方法,您始终可以使用子查询:

DECLARE @TOL AS REAL
SET @TOL = 0.05

SELECT 
    ObjectIDa,
    PTNameA,
    PTdescA,
    ObjectIDb,
    PTNameB,
    PTdescB,
    DIST
FROM
(
SELECT 
  PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
   -- AND
   -- PointA.ObjectId <> PointB.ObjectID
) Subquery
WHERE ObjectIDa <> ObjectIDb
ORDER BY ObjectIDa
于 2013-12-30T18:48:07.367 回答
1

尝试在 the和子句之间使用PointA.ObjectId <> PointB.ObjectIDwith子句。WHEREJOINORDER BY

像这样:

DECLARE @TOL AS REAL
SET @TOL = 0.05

SELECT 
    PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
WHERE PointA.ObjectId <> PointB.ObjectID
ORDER BY ObjectIDa
于 2013-12-30T04:55:30.887 回答
1

对@Mike_M 表示敬意,这里是编辑后的 ​​Select,它在 2 秒内运行。

SELECT 
    PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST

into #AllDuplicatesWithRepeats

FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL  
ORDER BY ObjectIDa

Select
    *
from    
    #AllDuplicatesWithRepeats d1
Where
    d1.ObjectIDa < d1.ObjectIDb
于 2013-12-31T03:20:51.693 回答