.net - C++/CLI 函数指针与 .NET 委托的性能

Question

对于我的 C++/CLI 项目，我只是尝试衡量 C++/CLI 函数指针与 .NET 委托的成本。

我的期望是，C++/CLI 函数指针比 .NET 委托更快。因此，我的测试分别计算了 5 秒内 .NET 委托和本机函数指针的调用次数。

结果

现在结果（并且仍然）令我震惊：

.NET 委托： 在 5003 毫秒内执行 910M，结果为 152080413333030
函数指针： 在 5013 毫秒内执行 347M 次，结果为 57893422166551

这意味着，本机 C++/CLI 函数指针的使用几乎比在 C++/CLI 代码中使用托管委托慢 3 倍。这个怎么可能？在性能关键部分使用接口、委托或抽象类时，我应该使用托管构造吗？

测试代码

连续调用的函数：

__int64 DoIt(int n, __int64 sum)
{
    if ((n % 3) == 0)
        return sum + n;
    else
        return sum + 1;
}

调用该方法的代码尝试使用所有参数以及返回值，因此没有任何东西被优化掉（希望如此）。这是代码（用于 .NET 代表）：

__int64 executions;
__int64 result;
System::Diagnostics::Stopwatch^ w = gcnew System::Diagnostics::Stopwatch();

System::Func<int, __int64, __int64>^ managedPtr = gcnew System::Func<int, __int64, __int64>(&DoIt);
w->Restart();
executions = 0;
result = 0;
while (w->ElapsedMilliseconds < 5000)
{
    for (int i=0; i < 1000000; i++)
        result += managedPtr(i, executions);
    executions++;
}
System::Console::WriteLine(".NET delegate:       {0}M executions with result {2} in {1}ms", executions, w->ElapsedMilliseconds, result);

与 .NET 委托调用类似，使用 C++ 函数指针：

typedef __int64 (* DoItMethod)(int n, __int64 sum);

DoItMethod nativePtr = DoIt;
w->Restart();
executions = 0;
result = 0;
while (w->ElapsedMilliseconds < 5000)
{
    for (int i=0; i < 1000000; i++)
        result += nativePtr(i, executions);
    executions++;
}
System::Console::WriteLine("Function pointer:    {0}M executions with result {2} in {1}ms", executions, w->ElapsedMilliseconds, result);

附加信息

使用 Visual Studio 2012 编译
.NET Framework 4.5 是目标
发布版本（执行计数与调试版本保持成比例）
调用约定是 __stdcall（当项目使用 CLR 支持编译时不允许使用 __fastcall）

已完成所有测试：

.NET 虚拟方法：在 5004 毫秒内执行 1025M，结果为 171358304166325
.NET 委托：在 5003 毫秒内执行 910M，结果为 152080413333030
虚拟方法：在 5006 毫秒内执行 336M，结果为 56056335999888
函数指针：在 5013 毫秒内执行 347M 次，结果为 57893422166551
函数调用：在 5001 毫秒内执行 1459M，结果为 244230520832847
内联函数：在 5000 毫秒内执行 1385M 次，结果为 231791984166205

对“DoIt”的直接调用在这里由“函数调用”表示，它似乎被编译器内联，因为与对内联函数的调用相比，执行计数没有（显着）差异。

对 C++ 虚方法的调用与函数指针一样“慢”。托管类（引用类）的虚拟方法与 .NET 委托一样快。

更新： 我挖得更深一点，似乎对于使用非托管函数的测试，每次调用 DoIt 函数时都会转换到本机代码。因此，我将内部循环包装到另一个我强制编译非托管的函数中：

#pragma managed(push, off)
__int64 TestCall(__int64* executions)
{
    __int64 result = 0;
    for (int i=0; i < 1000000; i++)
            result += DoItNative(i, *executions);
    (*executions)++;
    return result;
}
#pragma managed(pop)

另外我像这样测试了 std::function ：

#pragma managed(push, off)
__int64 TestStdFunc(__int64* executions)
{
    __int64 result = 0;
    std::function<__int64(int, __int64)> func(DoItNative);
    for (int i=0; i < 1000000; i++)
        result += func(i, *executions);
    (*executions)++;
    return result;
}
#pragma managed(pop)

现在，新的结果是：

函数调用：在 5000 毫秒内执行 2946M 次，结果为 495340439997054
std::function：在 5018 毫秒内执行 160M，结果为 26679519999840

std::function 有点令人失望。

score 17 · Accepted Answer

您正在看到“双重打击”的成本。DoIt() 函数的核心问题是它被编译为托管代码。委托调用非常快，通过委托从托管代码转到托管代码并不复杂。函数指针很慢，但是编译器会自动生成代码，首先从托管代码切换到非托管代码，然后通过函数指针进行调用。然后在一个存根中结束，该存根从非托管代码切换回托管代码并调用 DoIt()。

大概您真正要衡量的是对本机代码的调用。使用 #pragma 强制将 DoIt() 生成为机器代码，如下所示：

#pragma managed(push, off)
__int64 DoIt(int n, __int64 sum)
{
    if ((n % 3) == 0)
        return sum + n;
    else
        return sum + 1;
}
#pragma managed(pop)

您现在将看到函数指针比委托更快

.net - C++/CLI 函数指针与 .NET 委托的性能

结果

测试代码

附加信息

1 回答 1

Related

Reference