c# - 如何使我的函数在 ArrayList 上运行得与“包含”一样快？

Question

我无法弄清楚该Contains方法在 an 中查找元素ArrayList所需的时间与我编写的用于执行相同操作的小函数所需的时间之间的差异。该文档指出Contains执行线性搜索，因此它应该是 inO(n)而不是任何其他更快的方法。但是，虽然确切的值可能不相关，但该Contains方法会在00:00:00.1087087几秒钟内返回，而我的函数需要00:00:00.1876165. 它可能不多，但是在处理更大的数组时，这种差异会变得更加明显。我错过了什么，我应该如何编写我的函数来匹配Contains的表现？

我在 .NET 3.5 上使用 C#。

public partial class Window1 : Window
{
    public bool DoesContain(ArrayList list, object element)
    {
        for (int i = 0; i < list.Count; i++)
            if (list[i].Equals(element)) return true;

        return false;
    }

    public Window1()
    {
        InitializeComponent();

        ArrayList list = new ArrayList();
        for (int i = 0; i < 10000000; i++) list.Add("zzz " + i);

        Stopwatch sw = new Stopwatch();
        sw.Start();

        //Console.Out.WriteLine(list.Contains("zzz 9000000") + " " + sw.Elapsed);
        Console.Out.WriteLine(DoesContain(list, "zzz 9000000") + " " + sw.Elapsed);
    }
}

编辑：

好的，现在，小伙子们，看：

public partial class Window1 : Window
{
    public bool DoesContain(ArrayList list, object element)
    {
        int count = list.Count;
        for (int i = count - 1; i >= 0; i--)
            if (element.Equals(list[i])) return true;

        return false;
    }


    public bool DoesContain1(ArrayList list, object element)
    {
        int count = list.Count;
        for (int i = 0; i < count; i++)
            if (element.Equals(list[i])) return true;

        return false;
    }

    public Window1()
    {
        InitializeComponent();

        ArrayList list = new ArrayList();
        for (int i = 0; i < 10000000; i++) list.Add("zzz " + i);

        Stopwatch sw = new Stopwatch();
        long total = 0;
        int nr = 100;

        for (int i = 0; i < nr; i++)
        {
            sw.Reset();
            sw.Start();
            DoesContain(list,"zzz");
            total += sw.ElapsedMilliseconds;
        }
        Console.Out.WriteLine(total / nr);


        total = 0;
        for (int i = 0; i < nr; i++)
        {
            sw.Reset();
            sw.Start();
            DoesContain1(list, "zzz");
            total += sw.ElapsedMilliseconds;
        }
        Console.Out.WriteLine(total / nr);


        total = 0;
        for (int i = 0; i < nr; i++)
        {
            sw.Reset();
            sw.Start();
            list.Contains("zzz");
            total += sw.ElapsedMilliseconds;
        }
        Console.Out.WriteLine(total / nr);
    }
  }

我为我的函数的两个版本（前向和后向循环）和默认函数平均运行了 100 次Contains。对于我的函数来说，我得到的时间是毫秒，136而对于这个版本来说133是一个遥远的赢家。好吧，如果在您争辩说数据稀缺之前，我的结论是基于第一次独立运行的，您对这个测试有什么看法？不仅平均表现更好，而且在每次运行中都能获得始终如一的更好结果。那么，对于 3rd 方功能，这里是否存在某种劣势，或者什么？87ContainsContains

score 11 · Accepted Answer

首先，您不会多次运行它并比较平均值。

其次，您的方法在实际运行之前不会被触发。因此，及时编译时间被添加到其执行时间中。

一个真正的测试将运行多次并对结果进行平均（任何数量的事情都可能导致运行 X 的总 Y 中的一个或另一个变慢），并且您的程序集应该使用ngen.exe预先设置。

score 5 · Accepted Answer

当您使用 .NET 3.5 时，为什么要从 .NETArrayList开始，而不是List<string>?

有几件事要尝试：

您可以查看使用foreach而不是for循环是否有帮助

您可以缓存计数：

public bool DoesContain(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = 0; i < count; i++)
    {
        if (list[i].Equals(element))
        {
            return true;
        }
        return false;
    }
}

您可以反转比较：
```
if (element.Equals(list[i]))
```

虽然我不希望这些中的任何一个产生显着（积极）的差异，但它们是我接下来要尝试的事情。

您是否需要多次进行此收容测试？如果是这样，您可能想要构建一个HashSet<T>并重复使用它。

score 2 · Accepted Answer

我不确定是否允许您发布 Reflector 代码，但是如果您使用 Reflector 打开该方法，您可以看到它本质上是相同的（对空值进行了一些优化，但您的测试工具不包含空值）。

我能看到的唯一区别是调用list[i]会检查边界，i而 Contains 方法则不会。

score 1 · Accepted Answer

使用一个非常好的优化器根本不应该有区别，因为语义似乎是相同的。但是，现有的优化器可以优化您的功能，不如硬编码Contains的优化。一些优化点：

每次比较一个属性可能比向下计数和比较 0 慢
函数调用本身有其性能损失
使用迭代器而不是显式索引可以更快（foreach循环而不是普通for）

score 1 · Accepted Answer

首先，如果您使用的是提前知道的类型，我建议您使用泛型。所以 List 而不是 ArrayList。在幕后，ArrayList.Contains 实际上比您所做的要多一点。以下来自反射器：

public virtual bool Contains(object item)
{
    if (item == null)
    {
        for (int j = 0; j < this._size; j++)
        {
            if (this._items[j] == null)
            {
                return true;
            }
        }
        return false;
    }
    for (int i = 0; i < this._size; i++)
    {
        if ((this._items[i] != null) && this._items[i].Equals(item))
        {
            return true;
        }
    }
    return false;
}

请注意，它在为 item 传递 null 值时分叉自己。但是，由于您的示例中的所有值都不为 null，因此在开始和第二个循环中对 null 的额外检查理论上应该花费更长的时间。

你确定你正在处理完全编译的代码吗？即，当您的代码第一次运行时，它会在框架显然已经编译的地方进行 JIT 编译。

score 1 · Accepted Answer

使用下面的代码，我能够相对一致地获得以下时间（在几毫秒内）：
1：190ms DoesContainRev
2：198ms DoesContainRev1
3：188ms DoesContainFwd
4：203ms DoesContainFwd1
5：199ms Contains

这里有几点需要注意。

这是使用命令行中的发布编译代码运行的。许多人在 Visual Studio 调试环境中对代码进行基准测试时犯了错误，并不是说这里有人做过，而是要小心。

list[i].Equals(element)似乎比element.Equals(list[i]).

using System;
using System.Diagnostics;
using System.Collections;


namespace ArrayListBenchmark
{
class Program
{
    static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();
        const int arrayCount = 10000000;
        ArrayList list = new ArrayList(arrayCount);
        for (int i = 0; i < arrayCount; i++) list.Add("zzz " + i);
    sw.Start();
    DoesContainRev(list, "zzz");
    sw.Stop();
    Console.WriteLine(String.Format("1: {0}", sw.ElapsedMilliseconds));
    sw.Reset();

    sw.Start();
    DoesContainRev1(list, "zzz");
    sw.Stop();
    Console.WriteLine(String.Format("2: {0}", sw.ElapsedMilliseconds));
    sw.Reset();

    sw.Start();
    DoesContainFwd(list, "zzz");
    sw.Stop();
    Console.WriteLine(String.Format("3: {0}", sw.ElapsedMilliseconds));
    sw.Reset();

    sw.Start();
    DoesContainFwd1(list, "zzz");
    sw.Stop();
    Console.WriteLine(String.Format("4: {0}", sw.ElapsedMilliseconds));
    sw.Reset();

    sw.Start();
    list.Contains("zzz");
    sw.Stop();
    Console.WriteLine(String.Format("5: {0}", sw.ElapsedMilliseconds));
    sw.Reset();

    Console.ReadKey();
}
public static bool DoesContainRev(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = count - 1; i >= 0; i--)
        if (element.Equals(list[i])) return true;

    return false;
}
public static bool DoesContainFwd(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = 0; i < count; i++)
        if (element.Equals(list[i])) return true;

    return false;
}
public static bool DoesContainRev1(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = count - 1; i >= 0; i--)
        if (list[i].Equals(element)) return true;

    return false;
}
public static bool DoesContainFwd1(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = 0; i < count; i++)
        if (list[i].Equals(element)) return true;

    return false;
}
         }
        }

score 1 · Accepted Answer

在您编辑之后，我复制了代码并对其进行了一些改进。
差异不可重现，结果证明是测量/舍入问题。

要看到这一点，请将您的运行更改为以下形式：

    sw.Reset();
    sw.Start();
    for (int i = 0; i < nr; i++)
    {          
        DoesContain(list,"zzz");            
    }
    total += sw.ElapsedMilliseconds;
    Console.WriteLine(total / nr);

我只是移动了一些行。JIT 问题对于这种重复次数来说是微不足道的。

score 0 · Accepted Answer

0

阅读评论后修改：

它不使用一些哈希算法来实现快速查找。

于 2010-02-25T20:35:52.357 回答

score 0 · Accepted Answer

我的猜测是 ArrayList 是用 C++ 编写的，并且可以利用一些微优化（注意：这是一个猜测）。

例如，在 C++ 中，您可以使用指针算法（特别是递增指针以迭代数组）比使用索引更快。

score 0 · Accepted Answer

使用SortedList<TKey,TValue>或用于基于键的快速访问Dictionary<TKey, TValue>。System.Collections.ObjectModel.KeyedCollection<TKey, TValue>

var list = new List<myObject>(); // Search is sequential
var dictionary = new Dictionary<myObject, myObject>(); // key based lookup, but no sequential lookup, Contains fast
var sortedList = new SortedList<myObject, myObject>(); // key based and sequential lookup, Contains fast

KeyedCollection<TKey, TValue>也很快并且允许索引查找，但是，它需要被继承，因为它是抽象的。因此，您需要一个特定的集合。但是，通过以下内容，您可以创建一个通用的KeyedCollection.

public class GenericKeyedCollection<TKey, TValue> : KeyedCollection<TKey, TValue> {
   public GenericKeyedCollection(Func<TValue, TKey> keyExtractor) {
      this.keyExtractor = keyExtractor;
   }

   private Func<TValue, TKey> keyExtractor;

   protected override TKey GetKeyForItem(TValue value) {
      return this.keyExtractor(value);
   }
}

使用 KeyedCollection 的优点是 Add 方法不需要指定键。

score 0 · Accepted Answer

使用数组结构，如果没有任何附加信息，您无法比 O(n) 更快地搜索。如果你知道数组是排序的，那么你可以使用二分搜索算法并且只花费 o(log(n)) 否则你应该使用一个集合。

c# - 如何使我的函数在 ArrayList 上运行得与“包含”一样快？

11 回答 11

Related

Reference