使用 LINQ 增加乐趣
经过几次尝试后,我想出了一个理论上可以是单线的纯 LINQ 解决方案。我确实试图让它变得高效,但当然功能解决方案会导致重复计算,因为你不能保持状态。
我们从一些预处理开始,以节省以后的重复计算。是的,我知道我对 index 所做的做法是有问题的,但如果你小心的话,它会起作用并且很快就会到达那里:
var index = 0;
var lookup = original.ToLookup(s => s, s => index++);
怪物
var occurrences = userSelect
.Where(lookup.Contains)
.SelectMany((s, i) => lookup[s]
.Select(j => new {
User = userSelect.Skip(i),
Original = original.Skip(j),
Skipped = i
})
.Select(t => t.User.Zip(t.Original, (u, v) => Tuple.Create(u, v, t.Skipped))
.TakeWhile(tuple => tuple.Item1 == tuple.Item2)
)
.Select(u => new {
Word = s,
Start = u.Select(v => v.Item3).Min(),
Length = u.Count()
})
)
.GroupBy(v => v.Start + v.Length)
.Select(g => g.OrderBy(u => u.Start).First())
.GroupBy(v => v.Word)
.Select(g => g.OrderByDescending(u => u.Length).First())
.Select(w => Enumerable.Range(w.Start, w.Length).ToArray())
.ToList();
打印这个
foreach (var occurrence in occurrences) {
Console.WriteLine(
"Maximal match starting with '{0}': [{1}]",
userSelect[occurrence[0]],
string.Join(", ", occurrence)
);
}
给
Maximal match starting with 'the': [0, 1, 2]
Maximal match starting with 'dog': [3]
Maximal match starting with 'jumps': [4, 5]
很明显,您不想在生产中使用此代码,到目前为止,另一种(程序)解决方案会更好。但是,此解决方案的区别在于,除了lookup
. 当然也可以写成函数式:
var lookup = original.Select((s, i) => Tuple.Create)
.ToLookup(t => t.Item1, t => t.Item2);
这个怎么运作
热身,它创建了一个类似字典的结构,将每个单词关联original
到它出现在同一集合中的索引。这将在以后用于从 in 中的每个单词创建尽可能多的匹配序列userSelect
(例如,“the”将导致它成为两个匹配序列,因为它在 中出现两次original
)。
然后:
.Where(lookup.Contains)
这很容易,它会从考虑中删除所有userSelect
未出现在 中的单词original
。
// For each place where the word s appears in original...
.SelectMany((s, i) => lookup[s]
// Define the two subsequences of userSelect and original to work on.
// We are trying to find the number of identical elements until first mismatch.
.Select(j => new { User = userSelect.Skip(i), Original = original.Skip(j), Skipped = j })
// Use .Zip to find this subsequence
.Select(t => t.User.Zip(t.Original, (u, v) => Tuple.Create(u, v, t.Skipped)).TakeWhile(tuple => tuple.Item1 == tuple.Item2))
// Note the index in original where the subsequence started and its length
.Select(u => new { Word = s, Start = u.Select(v => v.Item3).Min(), Length = u.Count() })
)
在这一点上,我们已经将每个匹配的单词投影到一个具有和属性userSelect
的匿名对象中。但是,匹配长度为 N 的序列也将导致长度为 N-1、N-2、... 1 的较小匹配序列。Start
Length
这里的关键是要意识到对于这些集合中的所有子序列Start + Length
都是相同的;此外,来自不同集合的子序列将具有不同的总和Start + Length
。因此,让我们利用缩减结果:
// Obvious from the above
.GroupBy(v => v.Start + v.Length)
// We want to keep the longest subsequence. Since Start + Length is constant for
// all, it follows the one with the largest Length has the smallest Start:
.Select(g => g.OrderBy(u => u.Start).First())
这仍然会给我们留下与每个单词userSelect
出现的次数一样多的匹配项original
。所以让我们也把它减少到最长的匹配:
.GroupBy(v => v.Word)
.Select(g => g.OrderByDescending(u => u.Length).First())
我们现在有一个像{ Word = "the", Start = 0, Length = 3 }
. 让我们将其转换为索引数组userSelect
:
.Select(w => Enumerable.Range(w.Start, w.Length).ToArray())
最后将所有这些阵列放在同一个集合中,任务完成!