c# - 好奇心：为什么 Expression<...> 在编译时比最小的 DynamicMethod 运行得更快？

Question

我目前正在做一些最后的优化，主要是为了好玩和学习，并发现了一些让我有几个问题的东西。

首先，问题：

当我通过使用DynamicMethod构造内存中的方法并使用调试器时，在反汇编视图中查看代码时，有什么方法可以让我进入生成的汇编代码？调试器似乎只是为我跳过了整个方法
或者，如果这不可能，我是否可以以某种方式将生成的 IL 代码作为程序集保存到磁盘，以便我可以使用Reflector检查它？
为什么Expression<...>我的简单加法版本 (Int32+Int32 => Int32) 比最小的 DynamicMethod 版本运行得更快？

这是一个简短而完整的演示程序。在我的系统上，输出是：

DynamicMethod: 887 ms
Lambda: 1878 ms
Method: 1969 ms
Expression: 681 ms

我希望 lambda 和方法调用具有更高的值，但 DynamicMethod 版本始终慢约 30-50%（可能由于 Windows 和其他程序而有所不同）。有人知道原因吗？

这是程序：

using System;
using System.Linq.Expressions;
using System.Reflection.Emit;
using System.Diagnostics;

namespace Sandbox
{
    public class Program
    {
        public static void Main(String[] args)
        {
            DynamicMethod method = new DynamicMethod("TestMethod",
                typeof(Int32), new Type[] { typeof(Int32), typeof(Int32) });
            var il = method.GetILGenerator();

            il.Emit(OpCodes.Ldarg_0);
            il.Emit(OpCodes.Ldarg_1);
            il.Emit(OpCodes.Add);
            il.Emit(OpCodes.Ret);

            Func<Int32, Int32, Int32> f1 =
                (Func<Int32, Int32, Int32>)method.CreateDelegate(
                    typeof(Func<Int32, Int32, Int32>));
            Func<Int32, Int32, Int32> f2 = (Int32 a, Int32 b) => a + b;
            Func<Int32, Int32, Int32> f3 = Sum;
            Expression<Func<Int32, Int32, Int32>> f4x = (a, b) => a + b;
            Func<Int32, Int32, Int32> f4 = f4x.Compile();
            for (Int32 pass = 1; pass <= 2; pass++)
            {
                // Pass 1 just runs all the code without writing out anything
                // to avoid JIT overhead influencing the results
                Time(f1, "DynamicMethod", pass);
                Time(f2, "Lambda", pass);
                Time(f3, "Method", pass);
                Time(f4, "Expression", pass);
            }
        }

        private static void Time(Func<Int32, Int32, Int32> fn,
            String name, Int32 pass)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            for (Int32 index = 0; index <= 100000000; index++)
            {
                Int32 result = fn(index, 1);
            }
            sw.Stop();
            if (pass == 2)
                Debug.WriteLine(name + ": " + sw.ElapsedMilliseconds + " ms");
        }

        private static Int32 Sum(Int32 a, Int32 b)
        {
            return a + b;
        }
    }
}

score 54 · Accepted Answer

via 创建的方法DynamicMethod经过两个 thunk，而 via 创建的方法Expression<>不经过任何一个。

这是它的工作原理。fn(0, 1)这是在方法中调用的调用顺序Time（为了便于调试，我将参数硬编码为 0 和 1）：

00cc032c 6a01            push    1           // 1 argument
00cc032e 8bcf            mov     ecx,edi
00cc0330 33d2            xor     edx,edx     // 0 argument
00cc0332 8b410c          mov     eax,dword ptr [ecx+0Ch]
00cc0335 8b4904          mov     ecx,dword ptr [ecx+4]
00cc0338 ffd0            call    eax // 1 arg on stack, two in edx, ecx

对于我调查的第一次调用，DynamicMethod，该call eax行如下所示：

00cc0338 ffd0            call    eax {003c2084}
0:000> !u 003c2084
Unmanaged code
003c2084 51              push    ecx
003c2085 8bca            mov     ecx,edx
003c2087 8b542408        mov     edx,dword ptr [esp+8]
003c208b 8b442404        mov     eax,dword ptr [esp+4]
003c208f 89442408        mov     dword ptr [esp+8],eax
003c2093 58              pop     eax
003c2094 83c404          add     esp,4
003c2097 83c010          add     eax,10h
003c209a ff20            jmp     dword ptr [eax]

这似乎是在做一些堆栈调整来重新排列参数。我推测这是由于使用隐式“this”参数的代表与不使用的代表之间的差异。

最后的跳转是这样解决的：

003c209a ff20            jmp     dword ptr [eax]      ds:0023:012f7edc=0098c098
0098c098 e963403500      jmp     00ce0100

0098c098 处的其余代码看起来像一个 JIT thunk，它的开头jmp在 JIT 之后用 a 重写。只有在这个跳转之后，我们才能得到真正的代码：

0:000> !u eip
Normal JIT generated code
DynamicClass.TestMethod(Int32, Int32)
Begin 00ce0100, size 5
>>> 00ce0100 03ca            add     ecx,edx
00ce0102 8bc1            mov     eax,ecx
00ce0104 c3              ret

通过创建的方法的调用顺序Expression<>不同 - 它缺少堆栈调动代码。这是，从第一次跳转通过eax：

00cc0338 ffd0            call    eax {00ce00a8}

0:000> !u eip
Normal JIT generated code
DynamicClass.lambda_method(System.Runtime.CompilerServices.ExecutionScope, Int32, Int32)
Begin 00ce00a8, size b
>>> 00ce00a8 8b442404        mov     eax,dword ptr [esp+4]
00ce00ac 03d0            add     edx,eax
00ce00ae 8bc2            mov     eax,edx
00ce00b0 c20400          ret     4

现在，事情怎么会变成这样？

Stack swizzling 不是必需的（实际上使用了来自委托的隐式第一个参数，即不像绑定到静态方法的委托）
JIT 一定是由 LINQ 编译逻辑强制执行的，以便委托持有真实的目标地址，而不是假的。

我不知道 LINQ 是如何强制 JIT 的，但我确实知道如何自己强制 JIT - 通过至少调用一次函数。更新：我找到了另一种强制 JIT 的方法：使用restrictedSkipVisibilityargumetn 到构造函数并传递true. 因此，这里的修改后的代码通过使用隐式的“this”参数消除了堆栈混乱，并使用备用构造函数进行预编译，以便绑定地址是真实地址，而不是 thunk：

using System;
using System.Linq.Expressions;
using System.Reflection.Emit;
using System.Diagnostics;

namespace Sandbox
{
    public class Program
    {
        public static void Main(String[] args)
        {
            DynamicMethod method = new DynamicMethod("TestMethod",
                typeof(Int32), new Type[] { typeof(object), typeof(Int32),
                typeof(Int32) }, true);
            var il = method.GetILGenerator();

            il.Emit(OpCodes.Ldarg_1);
            il.Emit(OpCodes.Ldarg_2);
            il.Emit(OpCodes.Add);
            il.Emit(OpCodes.Ret);

            Func<Int32, Int32, Int32> f1 =
                (Func<Int32, Int32, Int32>)method.CreateDelegate(
                    typeof(Func<Int32, Int32, Int32>), null);
            Func<Int32, Int32, Int32> f2 = (Int32 a, Int32 b) => a + b;
            Func<Int32, Int32, Int32> f3 = Sum;
            Expression<Func<Int32, Int32, Int32>> f4x = (a, b) => a + b;
            Func<Int32, Int32, Int32> f4 = f4x.Compile();
            for (Int32 pass = 1; pass <= 2; pass++)
            {
                // Pass 1 just runs all the code without writing out anything
                // to avoid JIT overhead influencing the results
                Time(f1, "DynamicMethod", pass);
                Time(f2, "Lambda", pass);
                Time(f3, "Method", pass);
                Time(f4, "Expression", pass);
            }
        }

        private static void Time(Func<Int32, Int32, Int32> fn,
            String name, Int32 pass)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            for (Int32 index = 0; index <= 100000000; index++)
            {
                Int32 result = fn(index, 1);
            }
            sw.Stop();
            if (pass == 2)
                Console.WriteLine(name + ": " + sw.ElapsedMilliseconds + " ms");
        }

        private static Int32 Sum(Int32 a, Int32 b)
        {
            return a + b;
        }
    }
}

这是我系统上的运行时：

DynamicMethod: 312 ms
Lambda: 417 ms
Method: 417 ms
Expression: 312 ms

更新添加：

我尝试在我的新系统上运行此代码，这是一个运行 Windows 7 x64 并安装了 .NET 4 beta 2 的 Core i7 920（mscoree.dll 版本 4.0.30902），结果是可变的。

csc 3.5, /platform:x86, runtime v2.0.50727 (via .config)

Run #1
DynamicMethod: 214 ms
Lambda: 571 ms
Method: 570 ms
Expression: 249 ms

Run #2
DynamicMethod: 463 ms
Lambda: 392 ms
Method: 392 ms
Expression: 463 ms

Run #3
DynamicMethod: 463 ms
Lambda: 570 ms
Method: 570 ms
Expression: 463 ms

也许这是英特尔 SpeedStep 影响结果，或者可能是 Turbo Boost。无论如何，这很烦人。

csc 3.5, /platform:x64, runtime v2.0.50727 (via .config)
DynamicMethod: 428 ms
Lambda: 392 ms
Method: 392 ms
Expression: 428 ms

csc 3.5, /platform:x64, runtime v4
DynamicMethod: 428 ms
Lambda: 356 ms
Method: 356 ms
Expression: 428 ms

csc 4, /platform:x64, runtime v4
DynamicMethod: 428 ms
Lambda: 356 ms
Method: 356 ms
Expression: 428 ms

csc 4, /platform:x86, runtime v4
DynamicMethod: 463 ms
Lambda: 570 ms
Method: 570 ms
Expression: 463 ms

csc 3.5, /platform:x86, runtime v4
DynamicMethod: 214 ms
Lambda: 570 ms
Method: 571 ms
Expression: 249 ms

这些结果中的许多将是时间上的意外，无论是什么导致了 C# 3.5 / runtime v2.0 场景中的随机加速。我将不得不重新启动以查看 SpeedStep 或 Turbo Boost 是否对这些影响负责。

c# - 好奇心：为什么 Expression<...> 在编译时比最小的 DynamicMethod 运行得更快？

1 回答 1

Related

Reference