sql-server-2008 - 删除重复行

Question

我需要删除所有重复的行：

t1
--------------------
col1    col2    col3
1       a       b
2       a       c
3       a       b

在这个例子中，1 和 3 是重复的。我需要将两者都插入到另一个表中，然后从当前表中删除它们。

t1
--------------------
col1    col2    col3
1       a       c

t2
--------------------
col1    col2    col3
1       a       b
2       a       b

做这个的最好方式是什么？

编辑

我应该提供更多信息。t1 是一个包含导入行的临时表。有 4 个字段可以唯一标识一条记录，每行还有 20 多个字段。如果有重复，则需要将它们插入到不同的表中以供查看。因此，我不认为需要保留身份值，因为一旦将其插入系统，临时表中的值将不再有用。

score 0 · Accepted Answer

INSERT INTO T2(Col2, Col3)
SELECT Col2, Col3
FROM T1
WHERE EXISTS (  SELECT * 
                FROM T1 AS T 
                WHERE   T.Col2 = T1.Col2
                    AND T.Col3 = T1.Col3
                    AND T.Col1 <> T1.Col1
            )

DELETE FROM T1 
WHERE EXISTS (  SELECT * 
                FROM T2 
                WHERE   T2.Col2 = T1.Col2
                    AND T2.Col3 = T1.Col3
            )

score 0 · Accepted Answer

Select * into temp(temporary table) 
 from tablename 
       group by column_name1,column_name2 
         having (count(*)>=1)

--- 数据被插入到临时表中，没有任何重复

drop tablename

select * into Tablename from temp

score 0 · Accepted Answer

得到它的工作。

将所有重复记录插入 t2。

insert into t2
select src.col2, src.col3 from t1 src
inner join (select t1.col2, t1.col3 from t1
            group by t1.col2, t1.col3
            having count(*) > 1) duplicates 
on src.col2 = duplicates.col2 and src.col3 = duplicates.col3

从 t1 中删除重复项。

delete from t1
where t1.col1 in (
    select src.col1 from t1 src
        inner join (
                    select t1.col2, t1.col3 from t1
                    group by t1.col2, t1.col3
                    having count(*) > 1) duplicates
                    on src.col2 = duplicates.col2 and src.col3 = duplicates.col3
                   )
)

score 0 · Accepted Answer

以下代码对于删除重复记录很有用。该表必须有标识列，用于标识重复记录。示例中的表具有 ID 作为标识列，具有重复数据的列是 DuplicateColumn1、DuplicateColumn2 和 DuplicateColumn3。

DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

引自：http: //blog.sqlauthority.com/2007/03/01/sql-server-delete-duplicate-records-rows/

score 0 · Accepted Answer

查找所有重复行的一种方法是将表连接到所有可能具有重复数据的列上，并过滤掉 col1 值相同的行，如下所示：

select distinct a.col1
from t1 a inner join t1 b on a.col2 = b.col2 and a.col3 = b.col3
where a.col1 <> b.col1

您将使用它从 t1 插入 t2 （根据我对您问题的评论，假设您想在 t2 中保留来自 t1 的 col1 值）：

insert into t2 (col1, col2, col3)
select col1, col2, col3
from t1
where col1 in (
    select distinct a.col1
    from t1 a inner join t1 b on a.col2 = b.col2 and a.col3 = b.col3
    where a.col1 <> b.col1
)

然后从 t1 中删除：

delete t1
where col1 in (
    select distinct a.col1
    from t1 a inner join t1 b on a.col2 = b.col2 and a.col3 = b.col3
    where a.col1 <> b.col1
)

这可以通过使用临时表来保存 col1 值来简化，这样您就不必第二次进行自联接。使用临时表也会更安全。由于两个单独的查询每个都执行自连接，因此可以（远程）从 t1 中删除行而不将它们插入到 t2 中（即，如果在您对 t2 进行插入和从 t1 删除之间将新的重复项写入 t1，则新插入到 t1 的行将在第二个自连接中匹配）。

此外，对于删除，您可以使用 t2 而不是再次在 t1 上执行自连接（同样，如果我的假设是正确的并且您将 col1 值保留在 t2 中）。

sql-server-2008 - 删除重复行

5 回答 5

Related

Reference