12

不久前我问了这个问题,以删除基于列的重复记录。答案很有效:

delete from tbl
where id NOT in
(
select  min(id)
from tbl
group by sourceid
)

我现在有一个类似的情况,但是重复记录的定义是基于多列的。如何更改上述 SQL 以识别重复记录,其中唯一记录定义为从 Col1 + Col2 + Col3 连接。我会做这样的事情吗?

delete from tbl
where id NOT in
(
select  min(id)
from tbl
group by col1, col2, col3
)
4

2 回答 2

26

这显示了您要保留的行:

;WITH x AS 
(
  SELECT col1, col2, col3, rn = ROW_NUMBER() OVER 
      (PARTITION BY col1, col2, col3 ORDER BY id)
  FROM dbo.tbl
)
SELECT col1, col2, col3 FROM x WHERE rn = 1;

这显示了您要删除的行:

;WITH x AS 
(
  SELECT col1, col2, col3, rn = ROW_NUMBER() OVER 
      (PARTITION BY col1, col2, col3 ORDER BY id)
  FROM dbo.tbl
)
SELECT col1, col2, col3 FROM x WHERE rn > 1;

一旦您对上述两组正确感到满意,以下内容将实际删除它们:

;WITH x AS 
(
  SELECT col1, col2, col3, rn = ROW_NUMBER() OVER 
      (PARTITION BY col1, col2, col3 ORDER BY id)
  FROM dbo.tbl
)
DELETE x WHERE rn > 1;

请注意,在所有三个查询中,前 6 行是相同的,只有 CTE 之后的后续查询发生了变化。

于 2012-07-23T14:46:06.047 回答
4

试试这个。我创建了一个包含三列的表tblA 。

CREATE TABLE tblA
(
id int IDENTITY(1, 1),
colA int, 
colB int, 
colC int
)

并添加了一些重复值。

INSERT INTO tblA VALUES (1, 2, 3)
INSERT INTO tblA VALUES (1, 2, 3)
INSERT INTO tblA VALUES (4, 5, 6)
INSERT INTO tblA VALUES (7, 8, 9)
INSERT INTO tblA VALUES (7, 8, 9)

如果您在下面的语句中将选择替换为删除,您的多列删除将起作用。

SELECT MIN(Id) as id
FROM
(
SELECT COUNT(*) as aantal, a.colA, a.colB, a.colC
FROM tblA       a
INNER JOIN tblA b   ON b.ColA = a.ColA
                    AND b.ColB = a.ColB
                    AND b.ColC = a.ColC
GROUP BY a.id, a.colA, a.colB, a.colC
HAVING COUNT(*) > 1
) c
INNER JOIN tblA d ON d.ColA = c.ColA
                    AND d.ColB = c.ColB
                    AND d.ColC = c.ColC
GROUP BY d.colA, d.colB, d.colC
于 2012-11-27T08:40:24.443 回答