sql - 删除 SQL Server 2010 中的“重复”行

Question

我在批量插入脚本中犯了一个错误，所以现在我有不同 colX 的“重复”行。我需要删除这些重复的行，但我不知道如何。更准确地说，我有这个：

 col1 | col2 | col3 | colX      
----+----------------------
  0   |  1   |  2   |  a
  0   |  1   |  2   |  b
  0   |  1   |  2   |  c
  0   |  1   |  2   |  a
  3   |  4   |  5   |  x
  3   |  4   |  5   |  y
  3   |  4   |  5   |  x
  3   |  4   |  5   |  z

我想保留每个（行，colX）的第一次出现：

 col1 | col2 | col3 | colX      
----+----------------------
  0   |  1   |  2   |  a
  3   |  4   |  5   |  x

谢谢您的回复：）

score 10 · Accepted Answer

使用 Sql Server 的 CTE 尝试最简单的方法：http ://www.sqlfiddle.com/#!3/2d386/2

数据：

CREATE TABLE tbl
    ([col1] int, [col2] int, [col3] int, [colX] varchar(1));

INSERT INTO tbl
    ([col1], [col2], [col3], [colX])
VALUES
    (0, 1, 2, 'a'),
    (0, 1, 2, 'b'),
    (0, 1, 2, 'c'),
    (0, 1, 2, 'a'),
    (3, 4, 5, 'x'),
    (3, 4, 5, 'y'),
    (3, 4, 5, 'x'),
    (3, 4, 5, 'z');

解决方案：

select * from tbl;

with a as
(
  select row_number() over(partition by col1 order by col2, col3, colX) as rn 
  from tbl   
)
delete from a where rn > 1;

select * from tbl;

输出：

| COL1 | COL2 | COL3 | COLX |
-----------------------------
|    0 |    1 |    2 |    a |
|    0 |    1 |    2 |    b |
|    0 |    1 |    2 |    c |
|    0 |    1 |    2 |    a |
|    3 |    4 |    5 |    x |
|    3 |    4 |    5 |    y |
|    3 |    4 |    5 |    x |
|    3 |    4 |    5 |    z |


| COL1 | COL2 | COL3 | COLX |
-----------------------------
|    0 |    1 |    2 |    a |
|    3 |    4 |    5 |    x |

或者这个：http ://www.sqlfiddle.com/#!3/af826/1

数据：

CREATE TABLE tbl
    ([col1] int, [col2] int, [col3] int, [colX] varchar(1));

INSERT INTO tbl
    ([col1], [col2], [col3], [colX])
VALUES
    (0, 1, 2, 'a'),
    (0, 1, 2, 'b'),
    (0, 1, 2, 'c'),
    (0, 1, 2, 'a'),
    (0, 1, 3, 'a'),
    (3, 4, 5, 'x'),
    (3, 4, 5, 'y'),
    (3, 4, 5, 'x'),
    (3, 4, 5, 'z');

解决方案：

select * from tbl;


with a as
(
    select row_number() over(partition by col1, col2, col3 order by colX) as rn 
    from tbl   
)
delete from a where rn > 1;

select * from tbl;

输出：

| COL1 | COL2 | COL3 | COLX |
-----------------------------
|    0 |    1 |    2 |    a |
|    0 |    1 |    2 |    b |
|    0 |    1 |    2 |    c |
|    0 |    1 |    2 |    a |
|    0 |    1 |    3 |    a |
|    3 |    4 |    5 |    x |
|    3 |    4 |    5 |    y |
|    3 |    4 |    5 |    x |
|    3 |    4 |    5 |    z |

| COL1 | COL2 | COL3 | COLX |
-----------------------------
|    0 |    1 |    2 |    a |
|    0 |    1 |    3 |    a |
|    3 |    4 |    5 |    x |

score 2 · Accepted Answer

如果您可以只保留 colX 的最小值，您可以这样做：

delete t from t inner join 
    (select  min(colx) mincolx, col1, col2, col3
     from t
     group by col1, col2, col3
     having count(1) > 1) as duplicates
   on (duplicates.col1 = t.col1
   and duplicates.col2 = t.col2
   and duplicates.col3 = t.col3
   and duplicates.mincolx <> t.colx)

问题是您仍然有所有四列都相同的行。要摆脱这些，在运行第一个查询之后，您必须使用临时表。

SELECT distinct col1, col2, col3, colx 
INTO temp
  FROM (SELECT col1, col2, col3
         from t 
         group by col1, col2, col3
         having count(1) > 1) subq;

DELETE from t where exists 
   (select 1 from temp 
     where temp.col1 = t.col1 
       and temp.col2 = t.col2 
       and temp.col3 = t.col3);

这是一个示例 SQLFiddle。

score 2 · Accepted Answer

如果您有很多重复项，我建议使用 CTE 并在单独的表中读取所有非重复记录。但是，有一个推荐的帖子：MSDN

score 1 · Accepted Answer

假设 colX 是唯一的（在您的示例中并非如此，即使您说“不同的 colX”），您可以使用以下内容删除重复项：

;with cteDuplicates as
(
    select 
        *,
        row_number() over (partition by col1, col2, col3 order by colX) as ID
    from Duplicates
)
delete D from Duplicates D
    inner join cteDuplicates C on C.colX = D.Colx
where ID > 1

（假设您的表名为“重复”）

如果 colX 不是唯一的，请添加一个新的 uniqueidentifier 列，将不同的值插入其中，然后通过加入该列而不是 colX 来使用上面的代码。

score 0 · Accepted Answer

最简单的解决方案可能如下假设我们有表 emp_dept(empid, deptid) 有重复的行，在 Oracle 数据库

  delete from emp_dept where exists ( select * from emp_dept i where i.empid = emp_dept.empid and i.deptid = emp_dept.deptid and i.rowid < emp_dept.rowid )

在 sql server 或任何不支持 row id kinda 特性的数据库上，我们需要添加标识列来标识每一行。假设我们已将 nid 作为身份添加到表中

alter table emp_dept add nid int identity(1,1) -- to add identity column

现在删除重复的查询可以写成

  delete from emp_dept where exists ( select * from emp_dept i where i.empid = emp_dept.empid and i.deptid = emp_dept.deptid and i.nid< emp_dept.nid )

这里的概念是删除所有存在其他行的行，这些行具有相似的核心值但较小的 rowid 或标识。因此，如果存在重复行，那么具有较高行 ID 或标识的行将被删除。并且对于没有重复的行，它无法找到较低的行ID，因此不会被删除。

score 0 · Accepted Answer

尝试此代码 bt 风险自负

Delete from Table_name
WHERE Table_name.%%physloc%%
      NOT IN (SELECT MAX(b.%%physloc%%)
              FROM   Table_name b
              group by Col_1,Col_2)

使用 row_number() 的第二种方法，这是安全的方法

WITH CTE_Dup AS
(

 SELECT * ROW_NUMBER()OVER (PARTITIONBY SalesOrderno, ItemNo ORDER BY SalesOrderno, ItemNo)
 AS ROW_NO
 from dbo.SalesOrderDetails
)
Delete FROM CTE_Dup;

score 0 · Accepted Answer

我假设你正在使用SQL Server 2005/2008.

SELECT col1,
       col2,
       col3,
       colx
FROM
  (SELECT *,
          row_number() OVER (PARTITION BY col1,col2,col3
                             ORDER BY colx) AS r
   FROM table_name) a
WHERE r = 1;

sql - 删除 SQL Server 2010 中的“重复”行

7 回答 7

Related

Reference