5

我必须清理具有重复行的表:

id: serial id
gid: group id
url: string <- this is the column that I have to cleanup

一个gid可能有多个url值:

id    gid   url
----  ----  ------------
1     12    www.gmail.com
2     12    www.some.com
3     12    www.some.com <-- duplicate
4     13    www.other.com
5     13    www.milfsome.com <-- not a duplicate

我想对整个表执行一个查询并删除gidurl重复的所有行。在上面的示例中,删除后,我希望只剩下 1、2、4 和 5 个。

4

2 回答 2

13
;WITH x AS 
(
   SELECT id, gid, url, rn = ROW_NUMBER() OVER
     (PARTITION BY gid, url ORDER BY id) 
   FROM dbo.table
)
SELECT id,gid,url FROM x WHERE rn = 1 -- the rows you'll keep
-- SELECT id,gid,url FROM x WHERE rn > 1 -- the rows you'll delete
-- DELETE x WHERE rn > 1; -- do the delete

一旦您对第一个选择感到满意,这表明您将保留的行,将其删除并取消注释第二个选择。一旦您对此感到满意,这表明您将删除的行,将其删除并取消注释删除。

如果您不想删除数据,只需忽略SELECT...

于 2013-04-04T14:43:43.603 回答
1
SELECT 
MIN(id) AS id,
gid,
url
FROM yourTable
GROUP BY gid, url 
于 2013-04-04T14:44:57.217 回答