c# - 为什么从 IList 中按索引删除比从 ISet 中按项目删除要差得多？

Question

编辑：我将添加一些基准测试结果。列表中大约有 1000 - 5000 个项目，IList并且RemoveAt节拍ISet和Remove，但这并不值得担心，因为差异很小。当收藏规模扩大到 10000 或更多时，真正的乐趣就开始了。我只发布那些数据

昨晚我在这里回答了一个问题，遇到了一个奇怪的情况。

首先是一组简单的方法：

static Random rnd = new Random();
public static int GetRandomIndex<T>(this ICollection<T> source)
{
    return rnd.Next(source.Count);
}

public static T GetRandom<T>(this IList<T> source)
{
    return source[source.GetRandomIndex()];
}

-------------------------------------------------- -------------------------------------------------- --------------------------------

假设我从集合中随机删除 N 个项目。我会写这个函数：

public static void RemoveRandomly1<T>(this ISet<T> source, int countToRemove)
{
    int countToRemain = source.Count - countToRemove; 
    var inList = source.ToList();

    int i = 0;
    while (source.Count > countToRemain)
    {
        source.Remove(inList.GetRandom()); 
        i++;
    }
}

或者

public static void RemoveRandomly2<T>(this IList<T> source, int countToRemove)
{
    int countToRemain = source.Count - countToRemove;

    int j = 0;
    while (source.Count > countToRemain)
    {
        source.RemoveAt(source.GetRandomIndex()); 
        j++; 
    }
}

如您所见，第一个函数是为 an 编写的ISet，第二个函数是为 normal编写的IList。在第一个函数中，我从中按项目ISet和按索引删除IList，我相信这两者都是O(1). 为什么第二个函数的性能比第一个差很多，尤其是当列表变大的时候？

赔率（我的看法）：

1) 在第一个函数中，ISet将转换为一个IList（从中获取随机项IList），因为在第二个函数中没有执行这样的事情。

优势 IList。

2）在第一个函数中调用GetRandomItem了，而在第二个函数中，调用GetRandomIndex了，这又少了一步。

虽然微不足道，但优势 IList。

3）在第一个函数中，随机项是从一个单独的列表中获得的，因此获得的项可能已经从ISet. 这导致在while第一个函数的循环中进行更多的迭代。在第二个函数中，随机索引是从正在迭代的源中获取的，因此永远不会重复迭代。我已经对此进行了测试并验证了这一点。

i > j 总是，优势IList。

我认为这种行为的原因是List在添加或删除项目时需要不断调整大小。但显然在其他一些测试中没有。我跑了：

public static void Remove1(this ISet<int> set)
{
    int count = set.Count;
    for (int i = 0; i < count; i++)
    {
        set.Remove(i + 1);
    }
}

public static void Remove2(this IList<int> lst)
{
    for (int i = lst.Count - 1; i >= 0; i--)
    {
        lst.RemoveAt(i);
    }
}

发现第二个函数运行得更快。

试验台：

var f = Enumerable.Range(1, 100000);

var s = new HashSet<int>(f);
var l = new List<int>(f);

Benchmark(() =>
{
    //some examples...

    s.RemoveRandomly1(2500);
    l.RemoveRandomly2(2500);

    s.Remove1();
    l.Remove2();

}, 1);

public static void Benchmark(Action method, int iterations = 10000)
{
    Stopwatch sw = new Stopwatch();
    sw.Start();
    for (int i = 0; i < iterations; i++)
        method();

    sw.Stop();
    MsgBox.ShowDialog(sw.Elapsed.TotalMilliseconds.ToString());
}

只是想知道这两种结构是什么..谢谢..

结果：

var f = Enumerable.Range(1, 10000);

s.RemoveRandomly1(7500); => 5ms
l.RemoveRandomly2(7500); => 20ms


var f = Enumerable.Range(1, 100000);

s.RemoveRandomly1(7500); => 7ms
l.RemoveRandomly2(7500); => 275ms


var f = Enumerable.Range(1, 1000000);

s.RemoveRandomly1(75000); => 50ms
l.RemoveRandomly2(75000); => 925000ms

对于大多数典型的需求，一个列表会做..！

score 4 · Accepted Answer

首先，IList 和 ISet 不是任何东西的实现。我可以编写一个运行方式非常不同的 IList 或 ISet 实现，因此具体实现是重要的（在您的情况下为 List 和 HashSet）。

按索引访问列表项是 O(1)但不删除RemoveAt是 O(n)。

从末尾删除列表会很快，因为它不需要复制任何东西，它只是递减它的内部计数器，该计数器存储它有多少项目，直到底层数组中的空点数量低于阈值，此时它会将数组复制到较小的数组。一旦达到底层数组的最大容量，它就会创建一个两倍大小的新数组并复制元素。如果您低于某个阈值，它将创建一个大小为一半的数组并复制元素。它使用长度属性跟踪它有多大，因此未使用的插槽看起来就像它们不存在一样。

从列表中随机删除意味着它必须复制索引之后的所有数组条目，以便它们向下滑动一个位置，这本质上非常慢，特别是当列表的大小变大时。如果您有一个包含 100 万个条目的列表，并且您删除了索引 500,000 处的某些内容，则它必须将数组的后半部分复制到一个位置。

c# - 为什么从 IList 中按索引删除比从 ISet 中按项目删除要差得多？

1 回答 1

Related

Reference