5

我有一个包含以下列的 MySQL 表:

City      Country  Continent
New York  States   Noth America
New York  Germany  Europe - considering there's one ;)
Paris     France   Europe

如果我想找到有错字的“New Yokr”,使用 MySQL 存储函数很容易:

$querylev = "select City, Country, Continent FROM table 
            WHERE LEVENSHTEIN(`City`,'New Yokr') < 3"

但是如果有两个纽约城市,用全文搜索你可以输入“纽约州”,你会得到你想要的结果。

所以问题是,我可以搜索“New Yokr Statse”并得到相同的结果吗?

是否有任何功能可以合并 levenshtein 和 fulltext 以形成一个多合一的解决方案,或者我应该在 MySQL 中创建一个连接 3 列的新列?

我知道还有其他解决方案,例如 lucene 或 Sphinx(还有 soundex、metaphone,但对此无效),但我认为对我来说可能很难实现它们。

4

1 回答 1

0

很好的问题,也是我们如何使用字符列表和正则表达式边界来设计查询和检索我们希望的数据的一个很好的例子。

根据我们可能想要的准确性和我们在数据库中拥有的数据,我们当然可以设计基于各种表达式的自定义查询,例如这个New York State具有各种类型的示例:

([new]+\s+[york]+\s+[stae]+)

在这里,我们有三个字符列表,我们可以用其他可能的字母进行更新。

[new]
[york]
[stae]

我们还在这里添加了两组\s+作为我们的边界以提高准确性。

演示

这个片段只是展示了捕获组是如何工作的:

const regex = /([new]+\s+[york]+\s+[stae]+)/gmi;
const str = `Anything we wish to have before followed by a New York Statse then anything we wish to have after. Anything we wish to have before followed by a New  Yokr  State then anything we wish to have after. Anything we wish to have before followed by a New Yokr Stats then anything we wish to have after. Anything we wish to have before followed by a New York Statse then anything we wish to have after. `;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

PHP

$re = '/([new]+\s+[york]+\s+[stae]+)/mi';
$str = 'Anything we wish to have before followed by a New York Statse then anything we wish to have after. Anything we wish to have before followed by a New  Yokr  State then anything we wish to have after. Anything we wish to have before followed by a New Yokr Stats then anything we wish to have after. Anything we wish to have before followed by a New York Statse then anything we wish to have after. ';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);
于 2019-05-27T18:00:28.257 回答