mysql - 使用 LIKE 搜索重复客户

Question

我试图在如下所示的表中找到重复的客户：

customer_id | first_name | last_name 
-------------------------------------
          0 | Rich       | Smith
          1 | Paul       | Jones
          2 | Richard    | Smith
          3 | Jimmy      | Roberts

在这种情况下，我需要一个返回 customer_id 0 和 customer_id 2 的查询。该查询需要找到客户可能缩短了他们的名字的匹配项，Rich 而不是 Richard，或者 Rob 而不是 Robert。

我有这个查询，但它只返回一个（不是两个）匹配项。我需要查询返回的 Rich 和 Richard。

select distinct customers.customer_id, concat(customers.first_name,' ',customers.last_name) as name from customers
inner join customers dup on customers.last_name = dup.last_name
where (dup.first_name like concat('%', customers.first_name, '%')
and dup.customer_id <> customers.customer_id )
order by name

有人可以指出我正确的方向吗？

根据@tsOverflow，这是解决我的问题的最后一个查询：

select distinct customers.customer_id, concat(customers.first_name,' ',customers.last_name) as name 
from customers
    inner join customers dup on customers.last_name = dup.last_name
where ((dup.first_name like concat('%', customers.first_name, '%') 
            OR (customers.first_name like concat('%', dup.first_name, '%')) 
        )
    and dup.customer_id <> customers.customer_id )
order by name

上述解决方案可能存在性能问题。

score 1 · Accepted Answer

您的问题是因为 Rich 是 Richard 的子字符串，但反之则不然。

这将检查两种方式：

select distinct randomtest.customer_id, concat(randomtest.first_name,' ',randomtest.last_name) as name 
from randomtest
    inner join randomtest dup on randomtest.last_name = dup.last_name
where ((dup.first_name like concat('%', randomtest.first_name, '%') 
            OR (randomtest.first_name like concat('%', dup.first_name, '%')) 
        )
    and dup.customer_id <> randomtest.customer_id )
order by name

我添加了 OR 并反过来检查。请注意，在查询中使用 like 语句会影响性能 - 我不是这方面的专家，只是一个想法。

编辑：正如其他人在评论中提到的那样-这只会捕获“缩短”版本实际上只是一个子字符串的情况，它不会捕获迈克尔->迈克或威廉->比尔的情况，另一方面约翰和一些人名叫约翰逊的人也可能是两个完全不同的人。

mysql - 使用 LIKE 搜索重复客户

1 回答 1

Related

Reference