我有以下代码可以正常工作:
string[] userSelect = new string[] {"the", "sled", "had", "not", "moved", ";", "the", "driver", "was", "surprised", "."};
string[] original = new string[] {"the", "driver", "was", "surprised", ",", "too", ";", "the", "sled", "had", "not", "moved", "."};
var matches =
(from l in userSelect.Select((s, i) => new { s, i })
join r in original.Select((s, i) => new { s, i })
on l.s equals r.s
group l by r.i - l.i into g
from m in g.Select((l, j) => new { l.i, j = l.i - j, k = g.Key })
group m by new { m.j, m.k } into h
select h.Select(t => t.i).ToArray())
.ToArray();
// remove filter overlaps
int take = 0;
var filtered = matches.Where(m => !matches.Take(take++)
.Any(n => m.All(i => n.Contains(i))))
.ToArray();
使用上面我得到的结果:
{{0,1,2,3,4}, {6,7,8,9}, {5,6}, {10}}
注意 6 的重叠。因为 {"the", "driver", "was", "surprised"} 和 {";", "the"} 都在原句中。
对于这样的情况,我需要一个二级过滤器。它应该像这样找到所有值的重叠并将它们提取到独立数组中,这样就没有索引值重叠。输出应将重叠部分分开,如下所示:
{{0,1,2,3,4}, {7,8,9}, {10}, {6}, {5}}