0

我有一个非常大的联系人表,我正在构建一个界面来帮助我的客户进行重复数据删除。这是表格内容的示例

id | firstname | lastname | email            | address1 | addres2 | verifiedAt |
1  | James     | johnson  | james@test.com   |          |         |            | 
2  | David     | bloggs   | james@bloggs.com |          |         |            |
3  | John      | nobel    | james@nobel.com  |          |         |            |
4  | Terry     | jacket   | james@jacket.com |          |         | 05/05/2013 |
5  | James     | johnson  | james@johnson.com|          |         |            |
6  | James     | privett  | james@test.com   |          |         |            |

我需要编写一个查询,该查询将返回在同一个表中具有另一个联系人的第一个联系人,其中电子邮件地址匹配或名字+姓氏匹配。

这可能在单个查询中吗?

提前致谢

4

3 回答 3

2

试试这个(SQL Fiddle)。

SELECT DISTINCT *
FROM
(      SELECT 
           MIN(id) as [id]
        FROM mytable
        GROUP BY email
        HAVING COUNT(*) > 1
        UNION ALL
      SELECT
          MIN(id) as [id]
        FROM mytable
        GROUP BY firstName,lastName
        HAVING Count(*) > 1 )dups
JOIN myTable t
ON t.Id = dups.id
于 2013-05-24T09:02:19.873 回答
1

这有效(SQLFiddle DEMO):

SELECT a.* FROM mytable a
JOIN (
    SELECT email
    FROM mytable
    GROUP BY email
    HAVING count(*) > 1
) b ON a.email = b.email
UNION
SELECT a.* FROM mytable a
JOIN (
    SELECT firstname, lastname
    FROM mytable
    GROUP BY firstname, lastname
    HAVING count(*) > 1
) b ON a.firstname = b.firstname AND a.lastname = b.lastname

为确保此查询快速运行,请确保至少具有以下索引:

 CREATE INDEX i1 ON mytable(email);
 CREATE INDEX i2 ON mytable(firstname, lastname);
于 2013-05-24T09:19:41.080 回答
0

一种方法:

with cte as 
(select c.*,
        row_number() over (partition by email order by id) rnem,
        count(*) over (partition by email) ctem,
        row_number() over (partition by firstname, lastname order by id) rnfl,
        count(*) over (partition by firstname, lastname) ctfl
 from contacts c)
select * from cte
where (ctem > 1 and rnem = 1) or (ctfl > 1 and rnfl = 1)

SQLFiddle在这里

于 2013-05-24T09:30:34.453 回答