正如acfrancis已经回答的那样:它并没有比使用内置levenshtein
函数简单得多。
但是,要回答您的最后一个问题:是的,按照您建议的方式进行操作是可行的,而且难度不大。
代码
function checkQuestions($para1, $para2){
$arr1 = array_unique(array_filter(explode(' ', preg_replace('/[^a-zA-Z0-9]/', ' ', strtolower($para1)))));
$arr2 = array_unique(array_filter(explode(' ', preg_replace('/[^a-zA-Z0-9]/', ' ', strtolower($para2)))));
$intersect = array_intersect($arr1, $arr2);
$p1 = count($arr1); //Number of words in para1
$p2 = count($arr2); //Number of words in para2
$in = count($intersect); //Number of words in intersect
$lowest = ($p1 < $p2) ? $p1 : $p2; //Which is smaller p1 or p2?
return array(
'Average' => number_format((100 / (($p1+$p2) / 2)) * $in, 2), //Percentage the same compared to average length of questions
'Smallest' => number_format((100 / $lowest) * $in, 2) //Percentage the same compared to shortest question
);
}
解释
- 我们定义了一个接受两个参数的函数(参数是我们要比较的问题)。
- 我们过滤输入并转换为数组
- 使输入小写
strtolower
- 过滤掉非字母数字字符
preg_replace
- 我们在空格上展开过滤后的字符串
- 我们过滤创建的数组
- 删除空格
array_filter
- 删除重复项
array_unique
- 重复
2-4
第二个问题
- 在两个数组中查找匹配的单词并移动到新数组
$intersect
- 计算三个数组中每一个中的单词数
$p1
, $p2
, 和$in
- 计算相似度百分比并返回
然后,您需要设置一个阈值,以确定问题在被视为相同之前必须有多相似,例如80%
.
注意
- 该函数返回一个包含两个值的数组。第一个将长度与两个输入问题的平均值进行比较,第二个仅与最短的问题进行比较。您可以修改它返回单个值。
- 我用于
number_format
百分比...但是您int
可能会返回没问题
例子
示例 1
$question1 = 'The average of 20 numbers is zero. Of them, at the most, how many may be greater than zero?';
$question2 = 'The average of 20 numbers is zero. Of them how many may be greater than zero?';
if(checkQuestions($question1, $question2)['Average'] >= 80){
echo "Questions are the same...";
}
else{
echo "Questions are not the same...";
}
//Output: Questions are the same...
示例 2
$para1 = 'The average of 20 numbers is zero. Of them, at the most, how many may be greater than zero?';
$para2 = 'The average of 20 numbers is zero. Of them how many may be greater than zero?';
$para3 = 'The average of 20 numbers is zero. Of them how many may be greater than zero, at the most?';
var_dump(checkQuestions($para1, $para2));
var_dump(checkQuestions($para1, $para3));
var_dump(checkQuestions($para2, $para3));
/**
Output:
array(2) {
["Average"]=>
string(5) "93.33"
["Smallest"]=>
string(6) "100.00"
}
array(2) {
["Average"]=>
string(6) "100.00"
["Smallest"]=>
string(6) "100.00"
}
array(2) {
["Average"]=>
string(5) "93.33"
["Smallest"]=>
string(6) "100.00"
}
*/