c# - 从多个单独的列表中“配对”匹配项的最快非 LINQ 算法是什么？

Question

重要的提示

对于将此标记为重复的人，请理解我们不想要基于 LINQ 的解决方案。我们的真实示例有数万个范围内的几个原始列表，而基于 LINQ 的解决方案的性能不足以满足我们的需求，因为它们必须多次遍历列表才能执行其功能，并随着每个新的源列表进行扩展。

这就是为什么我们专门寻找一种非 LINQ算法，例如下面这个答案中建议的算法，它们通过枚举器同时遍历所有列表，并且只遍历一次。这似乎是迄今为止最好的，但我想知道是否还有其他人。

现在回到问题...

为了解释我们的问题，考虑这个假设问题：

我有多个列表，但为了保持这个示例简单，我们将其限制为两个，ListA 和 ListB，它们的类型都是List<int>. 他们的数据如下：

List A    List B
  1         2
  2         3
  4         4
  5         6
  6         8
  8         9
  9        10

...但是真正的列表可以有数万行。

我们接下来有一个名为 ListPairing 的类，其简单定义如下：

public class ListPairing
{
    public int? ASide{ get; set; }
    public int? BSide{ get; set; }
}

其中每个 'side' 参数实际上代表一个列表。（即如果有四个列表，它也会有一个 Cside 和一个 Dside。）

我们试图做的是构造一个List<ListPairing>初始化数据如下：

A Side    B Side
  1         -
  2         2
  -         3
  4         4
  5         -
  6         6
  8         8
  9         9
  -        10

Again, note there is no row with '7'

如您所见，结果看起来像一个完整的外部联接。但是，请参阅下面的更新。

现在开始，我们可以简单地这样做......

var finalList = ListA.Select(valA => new ListPairing(){ ASide = valA} );

哪个产生...

A Side    B Side
  1         -
  2         -
  4         -
  5         -
  6         -
  8         -
  9         -

现在我们要回填列表 B 中的值。这需要首先检查是否存在与 BSide 匹配的 ASide 的现有 ListPairing，如果有，则设置 BSide。

如果不存在具有匹配 ASide 的现有 ListPairing，则仅使用 BSide 集实例化一个新的 ListPairing（ASide 为空白。）

但是，考虑到所有必需的“FindFirst”调用，我觉得这不是最有效的方法。（这些列表可能长达数万个项目。）

但是，将这些列表合并一次会产生以下值...

1, 2, 3, 4, 5, 6, 8, 9, 10 (Note there is no #7)

我的想法是以某种方式使用值的有序联合，然后同时“遍历”两个列表，根据需要建立 ListPairings。这消除了对 FindFirst 的重复调用，但我想知道这是否是最有效的方法。

想法？

更新

人们建议这是使用 LINQ 获取完整外部联接的副本，因为结果是相同的......

我不是在LINQ完全外部连接之后。我追求高性能算法。

因此，我已经更新了这个问题。

我提出这个问题的原因是执行该功能所需的 LINQ 对我们的需求来说太慢了。在我们的模型中，实际上有四个列表，每个列表可以有数万行。这就是为什么我在最后建议使用 ID 的“联合”方法来获取要遍历的唯一“键”列表，但我认为发布的答案与您一样，但使用枚举器是一种更好的方法不需要预先提供 ID 列表。这将产生一次通过列表中所有项目的单次传递，这将轻松胜过基于 LINQ 的方法。

score 2 · Accepted Answer

结果并没有我希望的那么整洁，但是如果两个输入列表都已排序，那么您可以一起浏览它们，比较每个列表的头部元素：如果它们相等，则您有一对，否则发出最小的一个并推进该列表。

public static IEnumerable<ListPairing> PairUpLists(IEnumerable<int> sortedAList,
                                                   IEnumerable<int> sortedBList)
{
    // Should wrap these two in using() per Servy's comment with braces around
    // the rest of the method.
    var aEnum = sortedAList.GetEnumerator();
    var bEnum = sortedBList.GetEnumerator();
    bool haveA = aEnum.MoveNext();
    bool haveB = bEnum.MoveNext();

    while (haveA && haveB)
    {
        // We still have values left on both lists.
        int comparison = aEnum.Current.CompareTo(bEnum.Current);
        if (comparison < 0)
        {
            // The heads of the two remaining sequences do not match and A's is
            // lower. Generate a partial pair with the head of A and advance the
            // enumerator.
            yield return new ListPairing() {ASide = aEnum.Current};
            haveA = aEnum.MoveNext();
        }
        else if (comparison == 0)
        {
            // The heads of the two sequences match. Generate a pair.
            yield return new ListPairing() {
                    ASide = aEnum.Current,
                    BSide = bEnum.Current
                };
            // Advance both enumerators
            haveA = aEnum.MoveNext();
            haveB = bEnum.MoveNext();
        }
        else
        {
            // No match and B is the lowest. Generate a partial pair with B.
            yield return new ListPairing() {BSide = bEnum.Current};
            // and advance the enumerator
            haveB = bEnum.MoveNext();
        }
    }
    if (haveA)
    {
        // We still have elements on list A but list B is exhausted.
        do
        {
            // Generate a partial pair for all remaining A elements.
            yield return new ListPairing() { ASide = aEnum.Current };
        } while (aEnum.MoveNext());
    }
    else if (haveB)
    {
        // List A is exhausted but we still have elements on list B.
        do
        {
            // Generate a partial pair for all remaining B elements.
            yield return new ListPairing() { BSide = bEnum.Current };
        } while (bEnum.MoveNext());
    }
}

score 0 · Accepted Answer

var list1 = new List<int?>(){1,2,4,5,6,8,9};
var list2 = new List<int?>(){2,3,4,6,8,9,10};

var left = from i in list1
            join k in list2 on i equals k
            into temp
            from k in temp.DefaultIfEmpty()
            select new {a = i, b = (i == k) ? k : (int?)null};


var right = from k in list2
            join i in list1 on k equals i
            into temp
            from i in temp.DefaultIfEmpty()
            select new {a = (i == k) ? i : (int?)i , b = k};

var result = left.Union(right);

如果您需要与您的示例相同的顺序，那么您将需要提供一个索引和顺序（然后删除重复项）

var result = left.Select((o,i) => new {o.a, o.b, i}).Union(right.Select((o, i) => new {o.a, o.b, i})).OrderBy( o => o.i);

result.Select( o => new {o.a, o.b}).Distinct();

c# - 从多个单独的列表中“配对”匹配项的最快非 LINQ 算法是什么？

重要的提示

更新

2 回答 2

Related

Reference