对于我们的例子,让我们使用很棒的Levenshtein 距离算法。
所以假设我们有一个具有以下签名的函数:
private static int CalcLevenshteinDistance(string a, string b)
为了真正获得 and 之间的相似性,而不是distance,我们将使用:a
b
private static decimal CalcLevenshteinSimilarity(string a, string b)
{
return 1 - ((decimal)CalcLevenshteinDistance(a, b) /
Math.Max(a.Length, b.Length));
}
1
如果字符串完全相同,0
如果字符串根本不相似,或者介于两者之间,这将完全返回。例如,0.89
会是那个a
并且b
是89%
相似的(不错!)
为了帮助我们处理加权字段,让我们创建一个小助手类:
public class SuggestionField
{
public string SourceData { get; set; }
public string SuggestedData { get; set; }
public decimal Importance { get; set; }
}
这将表示将单个类型字段T
与源T
实例匹配所需的所有信息。
现在计算单个建议和源之间的加权相似度相当简单:
private static decimal RateSuggestion(IEnumerable<SuggestionField> fields)
{
return fields.Sum(x =>
x.Importance * CalcLevenshteinSimilarity(x.SourceData,
x.SuggestedData));
}
现在让我们将它包装在一个函数中,该函数可以获取所有可能的建议,以及SuggestionField
s 以一种非常酷且易于使用的方式:
public static IEnumerable<T> Suggest<T>
(IEnumerable<T> possibleSuggestions,
params Func<T, SuggestionField>[] fieldSelectors)
{
return possibleSuggestions
.Select(x => new
{
Suggestion = x,
Similarity = RateSuggestion(fieldSelectors.Select(f => f(x)))
})
.OrderByDescending(x => x.Similarity)
.TakeWhile(x => x.Similarity > 0.5m) // <-- Threshold here!
.Select(x => x.Suggestion);
}
好的,好的,那段代码乍一看可能有点混乱,但请放松。主要的混淆可能来自于params Func<T, SuggestionField>[] fieldSelectors
,因此也来自于Similarity = RateSuggestion(fieldSelectors.Select(f => f(x)))
。
对于那些精通 Linq 和所有带有选择器的游戏的人来说,可能已经了解如何使用该功能。无论如何,一会就清楚了!
用法:
// I'll be using anonymous types here, but you don't have to be lazy about it
var src = new {A = "Hello", B = "World", C = "and welcome!"};
var possibleSuggestions =
new[]
{
new {A = "Hola", B = "World", C = "Welcome!"},
new {A = "Bye", B = "world", C = "and fairwell"},
new {A = "Hell", B = "World", C = "arrives..."},
new {A = "Hello", B = "Earth", C = "and welcome!"},
new {A = "Hi", B = "world", C = "welcome!"}
};
var suggestions =
Suggest(possibleSuggestions,
x => new SuggestionField
{
SourceData = src.A,
SuggestedData = x.A,
Importance = 0.3m // 30%
},
x => new SuggestionField
{
SourceData = src.B,
SuggestedData = x.B,
Importance = 0.5m // 50%
},
x => new SuggestionField
{
SourceData = src.C,
SuggestedData = x.C,
Importance = 0.2m // 20%
}).ToArray();
这对您来说可能看起来不错,或者可以对其进行更改以使其更符合您的喜好,但我希望这个想法很清楚,并且有人会发现它有用;)
附言
当然,相似度阈值可以作为参数传入。随意添加任何想法和评论,以使其更好或更易读!