2

可能重复:
如何识别拼写不同的相似词

我试图在比较这 3 个字符串时返回 true:'voest'、'vost' 和 'vöst'(德国文化),因为它是同一个词。(事实上​​,只有 oe 和 ö 是相同的,但例如对于 DB 排序规则 CI,它是相同的,这是正确的,因为 'vost' 是一个错误输入的 'voest')

string.Compare(..) / string.Equals(..) 无论我为该方法提供什么参数,都总是返回 false。

如何使 string.Compare() / Equals(..) 返回 true ?

4

2 回答 2

5

您可以创建一个忽略变音符号的自定义比较器:

class IgnoreUmlautComparer : IEqualityComparer<string>
{
    Dictionary<char, char> umlautReplacer = new Dictionary<char, char>()
    {
        {'ä','a'}, {'Ä','A'},
        {'ö','o'}, {'Ö','O'},
        {'ü','u'}, {'Ü','U'},
    };
    Dictionary<string, string> pseudoUmlautReplacer = new Dictionary<string, string>()
    {
        {"ae","a"}, {"Ae","A"},
        {"oe","o"}, {"Oe","O"},
        {"ue","u"}, {"Ue","U"},
    };

    private IEnumerable<char> ignoreUmlaut(string s)
    {
        char value;
        string replaced = new string(s.Select(c => umlautReplacer.TryGetValue(c, out value) ? value : c).ToArray());
        foreach (var kv in pseudoUmlautReplacer)
            replaced = replaced.Replace(kv.Key, kv.Value);
        return replaced;
    }

    public bool Equals(string x, string y)
    {
        var xChars = ignoreUmlaut(x);
        var yChars = ignoreUmlaut(y);
        return xChars.SequenceEqual(yChars);
    }

    public int GetHashCode(string obj)
    {
        return ignoreUmlaut(obj).GetHashCode();
    }
}

现在您可以将此比较器与以下Enumerable方法一起使用Distinct

string[] allStrings = new[]{"voest","vost","vöst"};
bool allEqual = allStrings.Distinct(new IgnoreUmlautComparer()).Count() == 1;
// --> true
于 2012-11-26T10:33:55.777 回答
0

您可以在比较时尝试IgnoreNonSpace选项。它不会解决voest - vost,但会帮助vost-vöst。

int a = new CultureInfo("de-DE").CompareInfo.Compare("vost", "vöst", CompareOptions.IgnoreNonSpace);
// a = 0; strings are equal.
于 2012-11-26T10:04:19.857 回答