0

levenshtein 搜索是否可以针对数组检查搜索查询中的所有单词?

代码如下:

        $input = $query;

    // array of words to check against
    $words  = $somearray;

    // no shortest distance found, yet
    $shortest = -1;

    // loop through words to find the closest
    foreach ($words as $word) {

        // calculate the distance between the input word,
        // and the current word
        $lev = levenshtein($input, $word);

        // check for an exact match
        if ($lev == 0) {

            // closest word is this one (exact match)
            $closest = $word;
            $shortest = 0;

            // break out of the loop; we've found an exact match
            break;
        }

        // if this distance is less than the next found shortest
        // distance, OR if a next shortest word has not yet been found
        if ($lev <= $shortest || $shortest < 0) {
            // set the closest match, and shortest distance
            $closest  = $word;
            $shortest = $lev;
        }
    }

            if ($shortest == 0) {
      echo "Exact match found: $closest\n";
       } else {
         echo "Did you mean: $closest?\n";
        }

在这一个中,它可能只考虑第一个单词或整个句子作为要与数组匹配的字符串。如何获得结果并用更正的单词显示整个句子?

4

1 回答 1

0

好的,根据我现在从您的问题中了解到的情况,首先您需要将句子拆分为单词,例如: 如何将句子转换为单词数组?

之后,您可以将每个单词与您的字典进行比较,方法是遍历第一个数组并在第二个数组中循环,例如:

foreach ($words as $word)
{
    $min_distance = strlen($word); // use mb_strlen() for non-Latin
    foreach ($dictionary as $new_word)
    {
        $dist = levenshtein($word, $new_word);
        if (($dist < $min_distance) and ($dist > -1))
        {
            $min_distance = $dist;
            $suggestion = $new_word;
        }
    }
}

然后,如果距离大于 0,建议$suggestion.

请注意,这实际上是非常低效的!它以 Θ(n*m) 运行,假设levinshtein()以 O(1) 运行,因为您需要为每个单词循环遍历整个字典。您可能想从概念的角度了解这些东西在现实生活中是如何设计的,或者至少为较长的单词提供建议并遍历字典中更相关的部分。

于 2013-02-04T14:37:12.190 回答