c++ - 为什么原生 c++ 在 c++ 互操作中表现不佳？

Question

我在下面发布了一些代码，以测试使用Visual Studio 2010从本机c++和c#从c++/cli调用方法的性能（以毫秒为单位）。我有一个单独的本地 c++ 项目，它被编译成 dll。当我从 c++ 调用 c++ 时，我得到的预期结果比托管的同类产品快得多（大约 4 倍）。但是，当我从 c++/cli 调用 c++ 时，性能会慢 10 倍。

这是从 c++/cli 调用本机 c++ 时的预期行为吗？我的印象是不应该有显着差异，但这个简单的测试显示不同。这可能是 c++ 和 c++/cli 编译器之间的优化差异吗？

更新

我对 cpp 进行了一些更新，因此我不会在紧密循环中调用方法（正如 Reed Copsey 指出的那样），结果证明性能差异微不足道或非常小。当然，这取决于互操作的完成方式。

。H

#ifndef CPPOBJECT_H
#define CPPOBJECT_H

#ifdef CPLUSPLUSOBJECT_EXPORTING
    #define CLASS_DECLSPEC __declspec(dllexport)
#else
    #define CLASS_DECLSPEC __declspec(dllimport)
#endif

class CLASS_DECLSPEC CPlusPlusObject
{
public:
    CPlusPlusObject(){}
    ~CPlusPlusObject(){}

    void sayHello();
    double getSqrt(double n);
    // Update
    double wasteSomeTimeWithSqrt(double n);
};

#endif

.cpp

#include "CPlusPlusObject.h"
#include <iostream>

void CPlusPlusObject::sayHello(){std::cout << "Hello";}
double CPlusPlusObject::getSqrt(double n) {return std::sqrt(n);}
double CPlusPlusObject::wasteSomeTimeWithSqrt(double n)
{
    double result = 0;
    for (int x = 0; x < 10000000; x++)
    {
        result += std::sqrt(n);
    }
    return result;
}

c++/cli

const unsigned set = 100;
const unsigned repetitions = 1000000;
double cppcliTocpp()
{
    double n = 0;
    System::Diagnostics::Stopwatch^ stopWatch = gcnew System::Diagnostics::Stopwatch();

     stopWatch->Start();
     while (stopWatch->ElapsedMilliseconds < 1200){n+=0.001;}
     stopWatch->Reset();

    for (int x = 0; x < set; x++)
    {       
        stopWatch->Start();
        CPlusPlusObject cplusplusObject;
        n += cplusplusObject.wasteSomeTimeWithSqrt(123.456);
        /*for (int i = 0; i < repetitions; i++)
        {
            n += cplusplusObject.getSqrt(123.456);
        }*/
        stopWatch->Stop();
        System::Console::WriteLine("c++/cli call to native c++ took " + stopWatch->ElapsedMilliseconds + "ms.");
        stopWatch->Reset();
    }
    return n;
}

double cppcliTocSharp()
{
    double n = 0;
    System::Diagnostics::Stopwatch^ stopWatch = gcnew System::Diagnostics::Stopwatch();

    stopWatch->Start();
    while (stopWatch->ElapsedMilliseconds < 1200){n+=0.001;}
    stopWatch->Reset();

    for (int x = 0; x < set; x++)
    {       
        stopWatch->Start();
        CSharp::CSharpObject^ cSharpObject = gcnew CSharp::CSharpObject();
        for (int i = 0; i < repetitions; i++)
        {
            n += cSharpObject->GetSqrt(123.456);
        }
        stopWatch->Stop();
        System::Console::WriteLine("c++/cli call to c# took " + stopWatch->ElapsedMilliseconds + "ms.");
        stopWatch->Reset();
    }
    return n;
}

double cppcli()
{
    double n = 0;
    System::Diagnostics::Stopwatch^ stopWatch = gcnew System::Diagnostics::Stopwatch();

    stopWatch->Start();
    while (stopWatch->ElapsedMilliseconds < 1200){n+=0.001;}
    stopWatch->Reset();

    for (int x = 0; x < set; x++)
    {       
        stopWatch->Start();
        CPlusPlusCliObject cPlusPlusCliObject;
        for (int i = 0; i < repetitions; i++)
        {
            n += cPlusPlusCliObject.getSqrt(123.456);
        }
        stopWatch->Stop();
        System::Console::WriteLine("c++/cli took " + stopWatch->ElapsedMilliseconds + "ms.");
        stopWatch->Reset();
    }
    return n;
}

int main() 
{
    double n = 0;
    n += cppcliTocpp();
    n += cppcliTocSharp();
    n += cppcli();
    System::Console::WriteLine(n);
    System::Console::ReadKey();
}

score 4 · Accepted Answer

但是，当我从 c++/cli 调用 c++ 时，性能会慢 10 倍。

桥接 CLR 和本机代码需要编组。当从 C++/CLI 进入本地方法调用时，每个方法调用总会有一些开销。

开销（在这种情况下）看起来如此之大的唯一原因是您在紧密循环中调用了一个非常快速的方法。如果您要对类进行批处理，或者调用在运行时方面明显更长的方法，您会发现开销非常小。

score 1 · Accepted Answer

这些微基准非常危险。您努力避免典型的基准测试错误，但仍然落入了一个经典的陷阱。您的意图是测量方法调用开销，但这不是实际发生的情况。抖动优化器能够使用标准代码优化技术，如代码提升和方法内联。只有当您查看生成的机器代码时，您才能真正看到这一点。调试 + Windows + 反汇编窗口。

我使用启用了抖动优化器的 VS2012、32 位发布版本对此进行了测试。C++/CLI 代码是最快的，大约需要 128 毫秒：

000000bf  fld         qword ptr ds:[01212078h] 
000000c5  fsqrt 
000000c7  fstp        qword ptr [ebp-20h] 
//
// stopWatch->Start() call elided...
//
            n += cPlusPlusCliObject.getSqrt(123.456);
000000f5  fld         qword ptr [ebp-20h] 
000000f8  fadd        qword ptr [ebp-14h] 
000000fb  fstp        qword ptr [ebp-14h] 
        for (int i = 0; i < repetitions; i++)
000000fe  dec         eax 
000000ff  jne         000000F5

换句话说，std::sqrt() 调用被提升出循环，内部循环只是从生成的值中执行加法。没有方法调用。还要注意它实际上没有测量 sqrt() 调用所需的时间:)

使用 C# 方法调用的循环有点慢，大约需要 180 毫秒：

000000ea  fld         qword ptr ds:[01211EC0h] 
000000f0  fsqrt 
000000f2  fadd        qword ptr [ebp-14h] 
000000f5  fstp        qword ptr [ebp-14h] 
        for (int i = 0; i < repetitions; i++)
000000f8  dec         eax 
000000f9  jne         000000EA

只是对 Math::Sqrt() 的内联方法调用，它没有被提升。实际上不确定为什么，抖动优化器执行的优化确实包含时间因素。

而且我不会发布互操作调用的代码。但是，是的，由于需要实际进行函数调用，需要大约 380 毫秒，无法内联非托管代码，以及防止垃圾收集器误入非托管堆栈帧所需的 thunk。thunk 非常快，只需几纳秒，但这无法与直接内联 fadd 或 fsqrt 的抖动优化器竞争。

c++ - 为什么原生 c++ 在 c++ 互操作中表现不佳？

2 回答 2

Related

Reference