8

I have two tables that I need to merge together in PostgreSQL, on the common variable "company name." Unfortunately many of the company names don't match exactly (i.e. MICROSOFT in one table, MICROSFT in the other). I've tried removing common words from both columns such as "corporation" or "inc" or "ltd" in order to try to standardize names across both tables, but I'm having trouble thinking of additional strategies. Any ideas?

Thanks.

Also, if necessary I can do this in R.

4

1 回答 1

7

你考虑过fuzzystrmatch 模块吗?您可以使用soundex, difference, levenshtein,metaphonedmetaphone, 或组合。

模糊匹配文档

SELECT something
FROM somewhere
WHERE levenshtein(item1, item2) < Carefully_Selected_Threshold

例如,从MICROSOFTMICROSFT的 levenshtein 距离是一 (1)。

levenshtein(dmetaphone('MICROSOFT'), dmetaphone('MICROSFT')

以上返回零 (0)。结合使用 levenshtein 和 dmetaphone 可以帮助您匹配很多拼写错误。

于 2012-01-19T16:38:25.353 回答