1

我只需要从数据库中删除重复的用户信息。我的 c# 代码如下,但我只是想知道如何在 SQL 中实现这一点,而不是使用游标。我认为诀窍始于获取由电子邮件分隔的整个重复数据集的第一行或剩余行。

在 C# 中,我以 1000 封为一组收集重复的电子邮件,并在跳过第一封后删除剩余的行。

List<string> top1000_emails;
do
{
  top1000_emails = sql.dbCommand.GetFirstColumn<string>(@"select top 1000 email
      from UserBase
      group by email
      having COUNT(email) > 1");

  for (int i = 0; i < top1000_emails.Count; i++)
  {
     var tmpids = sql.dbCommand.GetFirstColumn<long>("select [Id] from UserBase where email = {0}", top1000_emails[i]).Skip(1);
     sql.dbCommand.DeleteByIds<UserBase>(tmpids);
   }
} while (top1000_emails.Count > 0);
4

3 回答 3

2

你可以简单地通过 SQL 来做 ti,像这样(如果你有 SQL Server 2005 或更高版本):

;WITH a AS (
    SELECT  *,
            ROW_NUMBER() OVER (PARTITION BY email ORDER BY Id) RowNum
    FROM    UserBase
)
-- deleted rows will be:
SELECT  *
--DELETE 
FROM    a
WHERE   a.RowNum <> 1
于 2012-07-12T09:55:57.467 回答
0

像这样的东西...

 --delete userbase 
 select * 
 from userbase
    left join (select email, MIN(id) minid from userbase group by email) mins
    on userbase.id = mins.minid
    and userbase.email = mins.email
 where mins.email is null

先把你的数据备份一下,以防万一 然后把select换成delete

于 2012-07-12T09:54:04.750 回答
0

假设您的 UserBase 表中有主键 ID。创建 UserBase_Unique Table 精确结构为 UserBase Table 并运行以下命令。UserBase_Unique 表将具有您正在寻找的结果。

INSERT INTO UserBase_Unique (ID,Email)

SELECT Min(ID)
FROM UserBase AS U
WHERE Exists (SELECT email, Count(ID)
FROM UserBase U2
WHERE U2.email= U2.email
GROUP BY email
HAVING Count(U2.ID) > 1)
GROUP BY email;
于 2012-07-12T10:46:27.267 回答