0

我想对字符串列表进行排序。我有 1000 个地址(一些用空格分隔的自定义地址数据)。第二件事是我的搜索查询。现在我想获取所有单词标记(没有数字)并按最小距离对它们进行排序。

例如

string query = "123 HAM";
// 1. get only "HAM" token
// 2. count distances
// 3. sort by them
//distance("HAM", "12 HAM DRIVE") -> 0
//distance("HAM", "13 HAM DRIVE") -> 0
//distance("HAM", "14 HAMER DRIVE") -> 2
//distance("HAM", "37 HAMMERSMITH AVENUE") -> 8

如果我的查询标记是,那么和之间的HAM距离是 0,和之间的距离是 2(因为还有 2 个字母),等等。HAMHAMHAMHAMERHAMER

我得到“单词”标记:

private static IEnumerable<string> GetLetterTokens(string location)
{
    string[] words = location.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);
    return words.Where(word => Regex.IsMatch(word.Trim(), @"^[a-zA-Z]+$"));
}

现在对于每个地址,我想计算这些距离并按它们排序。有什么快速的方法吗?我的意思是例如使用List<>.Sort.

谢谢你的建议:)

4

2 回答 2

2

我认为您可以使用Levenshtein Distance – LB

var result = addresses.OrderBy(a => 
         string.Join(" ", GetLetterTokens(a))
       , new LevenshteinDistance());

public class LevenshteinDistance : IComparer<String>
{
    /// <summary>
    /// Compute the distance between two strings.
    /// </summary>
    public int Compare(string s, string t)
    {
    int n = s.Length;
    int m = t.Length;
    int[,] d = new int[n + 1, m + 1];

    // Step 1
    if (n == 0)
    {
        return m;
    }

    if (m == 0)
    {
        return n;
    }

    // Step 2
    for (int i = 0; i <= n; d[i, 0] = i++)
    {
    }

    for (int j = 0; j <= m; d[0, j] = j++)
    {
    }

    // Step 3
    for (int i = 1; i <= n; i++)
    {
        //Step 4
        for (int j = 1; j <= m; j++)
        {
        // Step 5
        int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;

        // Step 6
        d[i, j] = Math.Min(
            Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
            d[i - 1, j - 1] + cost);
        }
    }
    // Step 7
    return d[n, m];
    }
}
于 2012-08-28T10:02:48.813 回答
1

我认为这就是你要找的:

    string token = "HAM";
    List<string> addresses = new List<string>
    {
        "12 HAM DRIVE",
        "13 HAM DRIVE",
        "14 HAMER DRIVE",
        "37 HAMMERSMITH AVENUE",
        "15 HAM HAMER DRIVE",
    };

    var result = from a in addresses
                 let tokens = GetLetterTokens(a)
                 let distances = from t in tokens
                                 where t.Contains(token)
                                 select t.Length - token.Length
                 where distances.Any()
                 let distance = distances.Min()
                 orderby distance
                 select new
                 {
                     Address = a,
                     Distance = distance,
                 };

If you only want tokens that start with the token you look for use StartsWith instead of Contains.

于 2012-08-28T10:06:10.550 回答