- 计算每个搜索词在主题字符串中的位置。
- 计算搜索字符串中所有词条的平均位置。
- 计算主题字符串和搜索词列表中的平均位置之间的绝对差。
- 计算术语位置相对于平均值的绝对差。
decimal Rank(string subject, IList<string> terms)
{
// Isolate all the words in the subject.
var words = Regex.Matches(subject, @"\w+")
.Cast<Match>()
.Select(m => m.Value.ToLower())
.ToList();
// Calculate the positions
var positions = new List<int>();
var sumPositions = 0;
foreach (var term in terms)
{
int pos = words.IndexOf(term.ToLower());
if (pos < 0) return decimal.MaxValue;
positions.Add(pos);
sumPositions += pos;
}
// Calculate the difference in average positions
decimal averageSubject = (decimal) sumPositions / terms.Count;
decimal averageTerms = (terms.Count - 1) / 2m; // average(0..n-1)
decimal rank = Math.Abs(averageSubject - averageTerms);
for (int i = 0; i < terms.Count; i++)
{
decimal relativePos1 = positions[i] - averageSubject;
decimal relativePos2 = i - averageTerms;
rank += Math.Abs(relativePos2 - relativePos1);
}
return rank;
}
我使用较低的值以获得更好的匹配,因为它比每次匹配的得分更容易测量与完美匹配的距离。
例子
Subject Terms Rank
"a b" "a" 0.0
"b a" "a" 1.0
"ccc a b" "a", "b" 1.0
"a ccc b" "a", "b" 1.5
"a b" "a", "b" 0.0
"b a" "a", "b" 2.0