在使用继承和泛型时,我一直无法理解在Func<...>
整个代码中使用的性能特征——这是我发现自己一直在使用的组合。
让我从一个最小的测试用例开始,这样我们都知道我们在说什么,然后我将发布结果,然后我将解释我的期望以及为什么......
最小测试用例
public class GenericsTest2 : GenericsTest<int>
{
static void Main(string[] args)
{
GenericsTest2 at = new GenericsTest2();
at.test(at.func);
at.test(at.Check);
at.test(at.func2);
at.test(at.Check2);
at.test((a) => a.Equals(default(int)));
Console.ReadLine();
}
public GenericsTest2()
{
func = func2 = (a) => Check(a);
}
protected Func<int, bool> func2;
public bool Check2(int value)
{
return value.Equals(default(int));
}
public void test(Func<int, bool> func)
{
using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
{
for (int i = 0; i < 100000000; ++i)
{
func(i);
}
}
}
}
public class GenericsTest<T>
{
public bool Check(T value)
{
return value.Equals(default(T));
}
protected Func<T, bool> func;
}
public class Stopwatch : IDisposable
{
public Stopwatch(Action<TimeSpan> act)
{
this.act = act;
this.start = DateTime.UtcNow;
}
private Action<TimeSpan> act;
private DateTime start;
public void Dispose()
{
act(DateTime.UtcNow.Subtract(start));
}
}
结果
Took 2.50s -> at.test(at.func);
Took 1.97s -> at.test(at.Check);
Took 2.48s -> at.test(at.func2);
Took 0.72s -> at.test(at.Check2);
Took 0.81s -> at.test((a) => a.Equals(default(int)));
我期望什么以及为什么
我希望这段代码对于所有 5 种方法都以完全相同的速度运行,更准确地说,甚至比其中任何一种方法都快,即与以下方法一样快:
using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
{
for (int i = 0; i < 100000000; ++i)
{
bool b = i.Equals(default(int));
}
}
// this takes 0.32s ?!?
我预计它需要 0.32 秒,因为我看不出 JIT 编译器在这种特殊情况下不内联代码的任何理由。
仔细观察,我根本不了解这些性能数字:
at.func
传递给函数并且在执行期间不能更改。为什么这不是内联的?at.Check
显然比 快at.Check2
,而两者都不能被覆盖并且 at.Check 在类 GenericsTest2 的情况下像石头一样固定Func<int, bool>
在传递内联Func
而不是转换为的方法时,我认为没有理由放慢速度Func
- 为什么测试用例 2 和 3 之间的差异高达 0.5 秒,而案例 4 和 5 之间的差异是 0.1 秒——它们不应该是相同的吗?
问题
我真的很想理解这一点......这里发生了什么,使用通用基类比内联整个批次慢 10 倍?
所以,基本上问题是:为什么会发生这种情况,我该如何解决?
更新
根据到目前为止的所有评论(谢谢!)我做了更多的挖掘。
首先,重复测试并将循环放大 5 倍并执行 4 次时的一组新结果。我使用了诊断秒表并添加了更多测试(也添加了描述)。
(Baseline implementation took 2.61s)
--- Run 0 ---
Took 3.00s for (a) => at.Check2(a)
Took 12.04s for Check3<int>
Took 12.51s for (a) => GenericsTest2.Check(a)
Took 13.74s for at.func
Took 16.07s for GenericsTest2.Check
Took 12.99s for at.func2
Took 1.47s for at.Check2
Took 2.31s for (a) => a.Equals(default(int))
--- Run 1 ---
Took 3.18s for (a) => at.Check2(a)
Took 13.29s for Check3<int>
Took 14.10s for (a) => GenericsTest2.Check(a)
Took 13.54s for at.func
Took 13.48s for GenericsTest2.Check
Took 13.89s for at.func2
Took 1.94s for at.Check2
Took 2.61s for (a) => a.Equals(default(int))
--- Run 2 ---
Took 3.18s for (a) => at.Check2(a)
Took 12.91s for Check3<int>
Took 15.20s for (a) => GenericsTest2.Check(a)
Took 12.90s for at.func
Took 13.79s for GenericsTest2.Check
Took 14.52s for at.func2
Took 2.02s for at.Check2
Took 2.67s for (a) => a.Equals(default(int))
--- Run 3 ---
Took 3.17s for (a) => at.Check2(a)
Took 12.69s for Check3<int>
Took 13.58s for (a) => GenericsTest2.Check(a)
Took 14.27s for at.func
Took 12.82s for GenericsTest2.Check
Took 14.03s for at.func2
Took 1.32s for at.Check2
Took 1.70s for (a) => a.Equals(default(int))
我从这些结果中注意到,当您开始使用泛型时,它会变得慢得多。深入挖掘我为非泛型实现找到的 IL:
L_0000: ldarga.s 'value'
L_0002: ldc.i4.0
L_0003: call instance bool [mscorlib]System.Int32::Equals(int32)
L_0008: ret
对于所有通用实现:
L_0000: ldarga.s 'value'
L_0002: ldloca.s CS$0$0000
L_0004: initobj !T
L_000a: ldloc.0
L_000b: box !T
L_0010: constrained. !T
L_0016: callvirt instance bool [mscorlib]System.Object::Equals(object)
L_001b: ret
虽然其中大部分都可以优化,但我认为这callvirt
可能是一个问题。
为了让它更快,我在方法的定义中添加了 'T : IEquatable' 约束。结果是:
L_0011: callvirt instance bool [mscorlib]System.IEquatable`1<!T>::Equals(!0)
虽然我现在对性能有了更多了解(它可能无法内联,因为它创建了一个 vtable 查找),但我仍然感到困惑:为什么它不简单地调用 T::Equals?毕竟,我确实指定它会在那里......