mysql - 重复数据库记录比较多个字段中的值

Question

所以我试图清理数据库表中的一些电话记录。

我发现了如何使用以下方法在 2 个字段中查找完全匹配项：

/* DUPLICATE first & last names */

SELECT 
    `First Name`, 
    `Last Name`, 
     COUNT(*) c 
FROM phone.contacts  
GROUP BY 
    `Last Name`, 
    `First Name` 
HAVING c > 1;

哇，太好了。

我想进一步扩展它以查看众多字段，以查看 3 个电话字段中的 1 个中的电话号码是否重复。

所以我想检查 3 个字段（general mobile, general phone, business phone）。

1.查看它们是否不为空（''） 2.查看其中任何一个中的数据（数字）是否出现在表中的其他 2 个电话字段中。

因此，将我有限的 SQL 推到了极限，我想出了以下内容，它似乎返回了带有 3 个空电话字段的记录以及没有重复电话号码的记录。

/* DUPLICATE general & business phone nos */

SELECT 
    id, 
   `first name`, 
   `last name`, 
   `general mobile`, 
   `general phone`, 
   `general email`, 
   `business phone`, 
    COUNT(CASE WHEN `general mobile` <> '' THEN 1 ELSE NULL END) as gen_mob, 
    COUNT(CASE WHEN `general phone` <> '' THEN 1 ELSE NULL END) as gen_phone,
    COUNT(CASE WHEN `business phone` <> '' THEN 1 ELSE NULL END) as bus_phone 
FROM phone.contacts 
GROUP BY 
   `general mobile`, 
   `general phone`, 
   `business phone` 
HAVING gen_mob > 1 OR gen_phone > 1 OR bus_phone > 1;

显然我的逻辑是有缺陷的&我想知道是否有人能指出我正确的方向/同情等等......

非常感谢

score 5 · Accepted Answer

您应该做的第一件事是拍摄给您的列命名的人，其中包含空格。

现在，试试这个：

SELECT DISTINCT
   c.id, 
   c.`first name`, 
   c.`last name`, 
   c.`general mobile`, 
   c.`general phone`, 
   c.`business phone`
from contacts_test c
join contacts_test c2
    on (c.`general mobile`!= '' and c.`general mobile` in (c2.`general phone`, c2.`business phone`))
    or (c.`general phone` != '' and c.`general phone` in (c2.`general mobile`, c2.`business phone`))
    or (c.`business phone`!= '' and c.`business phone` in (c2.`general mobile`, c2.`general phone`))

在 SQLFiddle 中查看此查询的实时演示。

请注意对的额外检查phone != ''，这是必需的，因为电话号码不可为空，因此它们的“未知”值为空白。如果没有此检查，则会返回错误匹配，因为空白当然等于空白。

如果DISTINCT有多个其他行匹配，则会添加关键字，这将导致 nxn 结果集。

score 1 · Accepted Answer

根据我的经验，在清理数据时，最好有一个全面的数据视图，以及一种简单的管理方法，而不是一次完成所有分析的大而笨重的查询。

您还可以（或多或少）重新规范化数据库，使用类似：

Create view VContactsWithPhones
as
Select id, 
       `Last Name` as LastName, 
       `First Name` as FirstName,
       `General Mobile` as Phone,
       'General Mobile' as PhoneType
From phone.contacts c
UNION
Select id, 
       `Last Name`, 
       `First Name`,
       `General Phone`,
       'General Phone'
From phone.contacts c
UNION
Select id, 
       `Last Name`, 
       `First Name`,
       `Business Phone`,
       'Business Phone'
From phone.contacts c

这将生成一个视图，其中包含原始表的三倍行，但具有Phone可以是三种类型之一的列。

您可以轻松地从该视图中选择：

//empty phones
SELECT * 
FROM VContactsWithPhones 
Where Phone is null or Phone = ''

//duplicate phones
Select Phone, Count(*)
from VContactsWithPhones 
where (Phone is not null and Phone <> '')  -- exclude empty values
group by Phone
having count(*) > 1

//duplicate phones belonging to the same ID (double entries)
Select Phone, ID, Count(*)
from VContactsWithPhones 
where (Phone is not null and Phone <> '')  -- exclude empty values
group by Phone, ID
having count(*) > 1

//duplicate phones belonging to the different ID (duplicate entries)
Select v1.Phone, v1.ID, v1.PhoneType, v2.ID, v2.PhoneType
from VContactsWithPhones v1
   inner join VContactsWithPhones v2 
     on v1.Phone=v2.Phone and v1.ID=v2.ID
where v1.Phone is not null and v1.Phone <> ''

等等等等……

score 0 · Accepted Answer

您可以尝试以下方法：

SELECT * from phone.contacts p WHERE `general mobile` IN (SELECT `general mobile` FROM phone.contacts WHERE id != p.id UNION SELECT `general phone` FROM phone.contacts WHERE id != p.id UNION SELECT `general email` FROM phone.contacts WHERE id != p.id)

每个重复 3 次general mobile：general phone和general email。它可以放在单个查询中，但可读性较差。

mysql - 重复数据库记录比较多个字段中的值

3 回答 3

Related

Reference