没有办法比较两篇文章。levenshtein()
并similar_text()
旨在比较两个词,而不是文章。
最简单的算法是逐字展开文章,找到逐字的相似性并根据您的任务进行一些数学运算,如下所示:
// not tested!
function similar_articles($articleA, $articleB) {
$wordsA = array_unique(preg_split('@[\W]+@', $articleA));
$wordsB = array_unique(preg_split('@[\W]+@', $articleA));
$resultSimilarity = 0;
foreach($wordsA as $wordA) {
$wordSimilarity = 0;
foreach($wordsB as $wordB) {
similar_text($wordA, $wordB, $percent);
$wordSimilarity = max($wordSimilarity, $percent);
}
$resultSimilarity += $wordSimilarity;
}
return($resultSimilarity / count($wordsA));
}
注意:similar_articles($artileA, $articleB)
!=similar_articles($artileB, $articleA)
因为similar_text($wordA, $wordB)
!= similar_text($wordB, $wordA)
。