6

我正在尝试有选择地从 SQL Server 2005 表中删除记录而不循环游标。该表可以包含许多记录(有时 > 500,000),因此循环太慢。

数据:

ID, UnitID, Day, Interval, Amount

1   100     10   21        9.345

2   100     10   22        9.367

3   200     11   21        4.150

4   300     11   21        4.350

5   300     11   22        4.734

6   300     11   23        5.106

7   400     13   21       10.257

8   400     13   22       10.428

关键是:ID、UnitID、Day、Interval。

在此示例中,我希望删除记录 2、5 和 8 - 它们与现有记录相邻(基于键)。

注意:记录 6 不会被删除,因为一旦 5 消失,它就不再相邻了。

我要求太多了吗?

4

5 回答 5

4

有关性能详细信息,请参阅我的博客中的这些文章:


下面查询的主要思想是我们应该从连续的区间范围内删除所有偶数行。

也就是说,如果给定(unitId, Day)我们有以下内容intervals

1
2
3
4
6
7
8
9

,我们有两个连续范围:

1
2
3
4

6
7
8
9

,我们应该删除每个偶数行:

1
2 -- delete
3
4 -- delete

6
7 -- delete
8
9 -- delete

,所以我们得到:

1
3
6
8

请注意,“偶数行”ROW_NUMBER()在这里表示“每个范围的偶数”,而不是“偶数值interval”。

这是查询:

DECLARE @Table TABLE (ID INT, UnitID INT, [Day] INT, Interval INT, Amount FLOAT)

INSERT INTO @Table VALUES (1, 100, 10, 21, 9.345)
INSERT INTO @Table VALUES (2, 100, 10, 22, 9.345)
INSERT INTO @Table VALUES (3, 200, 11, 21, 9.345)
INSERT INTO @Table VALUES (4, 300, 11, 21, 9.345)
INSERT INTO @Table VALUES (5, 300, 11, 22, 9.345)
INSERT INTO @Table VALUES (6, 300, 11, 23, 9.345)
INSERT INTO @Table VALUES (7, 400, 13, 21, 9.345)
INSERT INTO @Table VALUES (8, 400, 13, 22, 9.345)
INSERT INTO @Table VALUES (9, 400, 13, 23, 9.345)
INSERT INTO @Table VALUES (10, 400, 13, 24, 9.345)
INSERT INTO @Table VALUES (11, 400, 13, 26, 9.345)
INSERT INTO @Table VALUES (12, 400, 13, 27, 9.345)
INSERT INTO @Table VALUES (13, 400, 13, 28, 9.345)
INSERT INTO @Table VALUES (14, 400, 13, 29, 9.345)

;WITH   rows AS
        (
        SELECT  *,
                ROW_NUMBER() OVER
                (
                PARTITION BY
                        (
                        SELECT  TOP 1 qi.id AS mint
                        FROM    @Table qi
                        WHERE   qi.unitid = qo.unitid
                                AND qi.[day] = qo.[day]
                                AND qi.interval <= qo.interval
                                AND NOT EXISTS
                                (
                                SELECT  NULL
                                FROM    @Table t
                                WHERE   t.unitid = qi.unitid
                                        AND t.[day] = qi.day
                                        AND t.interval = qi.interval - 1
                                )
                        ORDER BY
                                qi.interval DESC
                        )
                ORDER BY interval
                ) AS rnm
        FROM    @Table qo
        )
DELETE
FROM    rows
WHERE   rnm % 2 = 0

SELECT  *
FROM    @table

更新:

这是一个更有效的查询:

DECLARE @Table TABLE (ID INT, UnitID INT, [Day] INT, Interval INT, Amount FLOAT)

INSERT INTO @Table VALUES (1, 100, 10, 21, 9.345)
INSERT INTO @Table VALUES (2, 100, 10, 22, 9.345)
INSERT INTO @Table VALUES (3, 200, 11, 21, 9.345)
INSERT INTO @Table VALUES (4, 300, 11, 21, 9.345)
INSERT INTO @Table VALUES (5, 300, 11, 22, 9.345)
INSERT INTO @Table VALUES (6, 300, 11, 23, 9.345)
INSERT INTO @Table VALUES (7, 400, 13, 21, 9.345)
INSERT INTO @Table VALUES (8, 400, 13, 22, 9.345)
INSERT INTO @Table VALUES (9, 400, 13, 23, 9.345)
INSERT INTO @Table VALUES (10, 400, 13, 24, 9.345)
INSERT INTO @Table VALUES (11, 400, 13, 26, 9.345)
INSERT INTO @Table VALUES (12, 400, 13, 27, 9.345)
INSERT INTO @Table VALUES (13, 400, 13, 28, 9.345)
INSERT INTO @Table VALUES (14, 400, 13, 29, 9.345)

;WITH    source AS
        (
        SELECT  *, ROW_NUMBER() OVER (PARTITION BY unitid, day ORDER BY interval) rn
        FROM    @Table
        ),
        rows AS
        (
        SELECT  *, ROW_NUMBER() OVER (PARTITION BY unitid, day, interval - rn ORDER BY interval) AS rnm
        FROM    source
        )
DELETE
FROM    rows
WHERE   rnm % 2 = 0

SELECT  *
FROM    @table
于 2009-08-03T15:37:31.073 回答
1

我不认为你所要求的是可能的——但你可能能够接近。看来您几乎可以通过查找具有如下自联接的记录来做到这一点:

SELECT t1.id
FROM
  table t1 JOIN table t2 ON (
    t1.unitid = t2.unitid AND
    t1.day = t2.day AND
    t1.interval = t2.interval - 1
  )

但问题是,它也会找到 id=6 。但是,如果您从这些数据创建一个临时表,它可能比您的原始数据小得多,因此使用游标进行扫描要快得多(修复 id=6 问题)。然后,您可以执行 aDELETE FROM table WHERE id IN (SELECT id FROM tmp_table)来杀死行。

可能有一种方法可以解决带有 oa 游标的 ID=6 问题,但如果是这样,我看不到它。

于 2009-08-03T15:14:41.610 回答
0

WHILE 语句,它是游标的替代方案。结合表变量可能会让您在您可以接受的性能范围内做同样的事情。

于 2009-08-03T15:18:52.320 回答
0
DECLARE @Table TABLE (ID INT, UnitID INT, [Day] INT, Interval INT, Amount FLOAT)

INSERT INTO @Table VALUES (1, 100, 10, 21, 9.345)
INSERT INTO @Table VALUES (2, 100, 10, 22, 9.367)
INSERT INTO @Table VALUES (3, 200, 11, 21, 4.150)
INSERT INTO @Table VALUES (4, 300, 11, 21, 4.350)
INSERT INTO @Table VALUES (5, 300, 11, 22, 4.734)
INSERT INTO @Table VALUES (6, 300, 11, 23, 5.106)
INSERT INTO @Table VALUES (7, 400, 13, 21, 10.257)
INSERT INTO @Table VALUES (8, 400, 13, 22, 10.428)

DELETE FROM @Table
WHERE ID IN (
  SELECT t1.ID
  FROM @Table t1
       INNER JOIN @Table t2 
            ON  t2.UnitID = t1.UnitID 
                AND t2.Day = t1.Day 
                AND t2.Interval = t1.Interval - 1
       LEFT OUTER JOIN @Table t3 
            ON  t3.UnitID = t2.UnitID 
                AND t3.Day = t2.Day 
                AND t3.Interval = t2.Interval - 1
  WHERE t3.ID IS NULL)

SELECT * FROM @Table
于 2009-08-03T15:19:01.557 回答
0

Lieven 是如此接近 - 它适用于测试集,但如果我添加更多记录,它就会开始错过一些。

我们不能使用任何奇数/偶数标准——我们不知道数据是如何下降的。

添加此数据并重试:

INSERT @Table VALUES (9,    100,     10,   23,        9.345)

INSERT @Table VALUES (10,   100,     10,   24,        9.367)

INSERT @Table VALUES (11,   100,     10,   25,        4.150)

INSERT @Table VALUES (12,   100,     10,   26,        4.350)

INSERT @Table VALUES (13,   300,     11,   25,        4.734)

INSERT @Table VALUES (14,   300,     11,   26,        5.106)

INSERT @Table VALUES (15,   300,     11,   27,       10.257)

INSERT @Table VALUES (16,   300,     11,   29,       10.428)
于 2009-08-03T16:04:39.347 回答