我正在尝试在我的数据库中查找潜在的重复项。有些人可能有重复,因为他们在姓名或姓氏中添加了“-”(无论出于何种原因)。我的查询目前不会拉出可能与“-”重复的人。最好的方法是什么?
这是我到目前为止的查询
SELECT t1.FirstName, t1.LastName, t1.ID, t2.dupeCount
FROM Contact t1
INNER JOIN (
SELECT FirstName, REPLACE(LastName, '-', ' ') as LastName, COUNT(*) AS dupeCount
FROM Contact
GROUP BY FirstName, LastName
HAVING COUNT(*) > 1
) t2 ON ((SOUNDEX(t1.LastName) = SOUNDEX(t2.LastName)
OR SOUNDEX(REPLACE(t1.LastName, '-', ' ')) like '%' + SOUNDEX(t2.LastName) + '%'
OR SOUNDEX(REPLACE(t2.LastName, '-', ' ')) like '%' + SOUNDEX(t1.LastName) + '%' )
AND SOUNDEX(t1.FirstName) = SOUNDEX(t2.FirstName))
ORDER BY t1.LastName, t1.ID