PHP中是否有任何函数可以检查两个字符串的相似度百分比?
例如我有:
$string1="Hello how are you doing"
$string2= " hi, how are you"
并且function($string1, $string2)
会让我返回真实,因为“如何”、“是”、“你”等词出现在该行中。
或者更好的是,返回 60% 的相似性,因为“如何”、“是”、“你”是$string1
.
PHP中是否存在执行此操作的任何函数?
因为这是一个很好的问题,所以我付出了一些努力:
<?php
$string1="Hello how are you doing";
$string2= " hi, how are you";
echo 'Compare result: ' . compareStrings($string1, $string2) . '%';
//60%
function compareStrings($s1, $s2) {
//one is empty, so no result
if (strlen($s1)==0 || strlen($s2)==0) {
return 0;
}
//replace none alphanumeric charactors
//i left - in case its used to combine words
$s1clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s1);
$s2clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s2);
//remove double spaces
while (strpos($s1clean, " ")!==false) {
$s1clean = str_replace(" ", " ", $s1clean);
}
while (strpos($s2clean, " ")!==false) {
$s2clean = str_replace(" ", " ", $s2clean);
}
//create arrays
$ar1 = explode(" ",$s1clean);
$ar2 = explode(" ",$s2clean);
$l1 = count($ar1);
$l2 = count($ar2);
//flip the arrays if needed so ar1 is always largest.
if ($l2>$l1) {
$t = $ar2;
$ar2 = $ar1;
$ar1 = $t;
}
//flip array 2, to make the words the keys
$ar2 = array_flip($ar2);
$maxwords = max($l1, $l2);
$matches = 0;
//find matching words
foreach($ar1 as $word) {
if (array_key_exists($word, $ar2))
$matches++;
}
return ($matches / $maxwords) * 100;
}
?>
正如其他答案已经说过的那样,您可以使用similar_text。这是演示:
$string1="Hello how are you doing" ;
$string2= " hi, how are you";
echo similar_text($string1, $string2, $perc); //12
echo $perc; //61.538461538462
将返回 12,并将在 $perc 中设置您要求的相似性百分比。
除了 Alex Siri 的回答和根据以下文章:
http://docstore.mik.ua/orelly/webprog/php/ch04_06.htm
PHP 提供了几个函数来测试两个字符串是否近似相等:
$string1="Hello how are you doing" ;
$string2= " hi, how are you";
声讯
if (soundex($string1) == soundex($string2)) {
echo "similar";
} else {
echo "not similar";
}
元音
if (metaphone($string1) == metaphone($string2)) {
echo "similar";
} else {
echo "not similar";
}
相似的文字
$similarity = similar_text($string1, $string2);
莱文斯坦
$distance = levenshtein($string1, $string2);
好的,这是我的功能,它使它变得非常有趣。
我正在检查字符串的大致相似性。
这是我为此使用的标准。
例子:
$string1 = "How much will it cost to me" (string in vocabulary)
$string2 = "How much does costs it " //("costs" instead "cost" -is a mistake) (user input);
算法: 1)检查单词的相似性并用“正确”的单词创建干净的字符串(按照它在词汇表中出现的顺序)。输出:“它要花多少钱” 2)用“正确的词”创建干净的字符串,以便它出现在用户输入中。输出:“花费多少” 3)比较两个输出 - 如果不同 - 返回否,否则如果相同返回是。
error_reporting(E_ALL);
ini_set('display_errors', true);
$string1="сколько это стоит ваще" ;
$string2= "сколько будет стоить это будет мне";
if(compareStrings($string1, $string2)) {
echo "yes";
} else {
echo 'no';
}
//echo compareStrings($string1, $string2);
function compareStrings($s1, $s2) {
if (strlen($s1)==0 || strlen($s2)==0) {
return 0;
}
while (strpos($s1, " ")!==false) {
$s1 = str_replace(" ", " ", $s1);
}
while (strpos($s2, " ")!==false) {
$s2 = str_replace(" ", " ", $s2);
}
$ar1 = explode(" ",$s1);
$ar2 = explode(" ",$s2);
// $array1 = array_flip($ar1);
// $array2 = array_flip($ar2);
$l1 = count($ar1);
$l2 = count($ar2);
$meaning="";
$rightorder="";
$compare=0;
for ($i=0;$i<$l1;$i++) {
for ($j=0;$j<$l2;$j++) {
$compare = (similar_text($ar1[$i],$ar2[$j],$percent)) ;
// echo $compare;
if ($percent>=85) {
$meaning=$meaning." ".$ar1[$i];
$rightorder=$rightorder." ".$ar1[$j];
$compare=0;
}
}
}
//print_r($rightorder);
if ($rightorder==$meaning) {
return true;
} else {
return false;
}
}
我很想听听您的意见和建议如何改进它
You can use the PHP function similar_text
.
int similar_text ( string $first , string $second)
Check the PHP doc at: http://php.net/manual/en/function.similar-text.php
虽然这个问题已经很老了,但由于几个原因只是添加了我的解决方案。首先是作者希望根据他的评论比较相似的单词而不是字符串。其次,大多数答案都试图通过similar_text
不适合这个问题的方法来解决它,因为它通过字符差异比较文本并找到相似性,这也会导致完全不同的字符串匹配。@Hugo Delsing 给出的第一个答案是使用array_flip
which 反转键和值,但如果键重复多次,它将只考虑单词。我已经发布了以下答案,它将比较单词。它可以给出的唯一问题是它不会非常考虑单词的顺序。
function compareStrings($s1, $s2)
{
if (strlen($s1) == 0 || strlen($s2) == 0) {
return 0;
}
$ar1 = preg_split('/[^\w\-]+/', strtolower($s1), null, PREG_SPLIT_NO_EMPTY);
$ar2 = preg_split('/[^\w\-]+/', strtolower($s2), null, PREG_SPLIT_NO_EMPTY);
$l1 = count($ar1);
$l2 = count($ar2);
$ar2_copy = array_values($ar2);
$matched_indices = [];
$word_map = [];
foreach ($ar1 as $k => $w1) {
if (isset($word_map[$w1])) {
if ($word_map[$w1][0] >= $k) {
$matched_indices[$k] = $word_map[$w1][0];
}
array_splice($word_map[$w1], 0, 1);
} else {
$indices = array_keys($ar2_copy, $w1);
$index_count = count($indices);
if ($index_count) {
if ($index_count == 1) {
$matched_indices[$k] = $indices[0];
// remove the word at given index from second array so that it won't repeat again
unset($ar2_copy[$indices[0]]);
} else {
$matched_indices[$k] = $indices[0];
// remove the word at given indices from second array so that it won't repeat again
foreach ($indices as $index) {
unset($ar2_copy[$index]);
}
array_splice($indices, 0, 1);
$word_map[$w1] = $indices;
}
}
}
}
return round(count($matched_indices) * 100 / $l1, 2);
}