php - 在 PHP 中检查两个字符串的近似匹配

Question

我正在尝试检查字符串的大致相似性。

这是我使用的标准。

1) 单词的顺序很重要 2) 单词可以有 80% 的相似度。

例子：

$string1 = "How much will it cost to me" //string in vocabulary (all "right" words is here)
$string2 = "How much does costs it "   //"costs" instead "cost" -is a deliberate mistake (user input);

算法： 1）检查单词的相似性并用“正确”的单词创建干净的字符串（根据它在词汇表中出现的顺序）。输出：“它要花多少钱” 2）用“正确”的词创建干净的字符串，以便它出现在用户输入中。输出：“花费多少” 3）比较两个输出 - 如果不同 - 返回否，否则如果相同返回是。

有什么建议么？

我开始写代码，但是我不熟悉PHP中的工具，所以我不知道如何合理有效地完成它。

它看起来更像 javascript/php

$string1="how much will it cost for me" ;
$string2= "how much does costs it";

function compareStrings($string1, $string2) {

    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }

    while (strpos($s1, "  ")!==false) {
        $s1 = str_replace("  ", " ", $s1);
    }
    while (strpos($s2, "  ")!==false) {
        $s2 = str_replace("  ", " ", $s2);
    }

    $ar1 = explode(" ",$s1);
    $ar2 = explode(" ",$s2);
    $array1 = array_flip($ar1);
    $array2 = array_flip($ar2);
    $l1 = count($ar1);
    $l2 = count($ar2);

 $meaning="";
    $rightorder=""

    for ($i=0;$i<=$l1;$i++) {


        for ($j=0;$j<=$l2;$j++) {

         $k=   similar_text($array1[i], $array2[j], $perc).PHP_EOL;
if ($perc>=85) {
    $meaning=$meaning." ".$array1[j]; //generating a string of the first output
    $rightorder[i]= array1[i]; //generating the array with second output

}

        }


    }

}

$meaning 的想法将得到“成本多少”，而 $rightorder 将得到

$rightorder[0]='how'
$rightorder[1]='much'
$rightorder[2]=''
$rightorder[3]='cost'
$rightorder[4]='it'

之后我会以某种方式将其转换回字符串“它的成本是多少”

并比较这两者。

if ("how much cost it"=="how much it cost") return true; else return false.

score 1 · Accepted Answer

您的问题属于 NLP（自然语言处理）科学。

问题中提到的每个问题都有自己的研究领域：

将字符串拆分为单词就是分词。这在英语中似乎微不足道，但在其他语言中却不是这样，例如德语。还有一个如何解析标点符号的问题。
创建“正确的词”称为词干提取。有许多工具可以做到这一点。如果你的话是英文的，你可以试试Porter Stemming Algorithm。其他语言可能有自己的词干提取技术，通常存在字典算法。
根据单个单词的出现来计算字符串的相似度称为“余弦相似度”。还有许多其他技术。还有一个问题是同义词和多义词

我希望它有所帮助，因为您的问题是上述问题的混合。

php - 在 PHP 中检查两个字符串的近似匹配

1 回答 1

Related

Reference