3

根据 MSDN 文档,__faststorefence_mm_sfence. 在我的时间里,它慢了三倍多。

平台:Win7-64,带有 x64 SDK 的 Visual Studio 2010。

#include <windows.h>
#include <xmmintrin.h>
#include <intrin.h>

int main(int argc, char* argv[])
{
    int* x = new int;
    __int64 loops = 1000000000; // 1 billion
    __int64 start, elapsed;

    start = __rdtsc();
    for (__int64 i = 0; i < loops; i++)
    {
        *x = 0;
        _mm_sfence();
    }
    elapsed = __rdtsc() - start;

    std::cout << "_mm_sfence: " << elapsed << std::endl
              << "average   : " << double(elapsed) / double(loops) << std::endl;

    start = __rdtsc();
    for(__int64 i = 0; i < loops; i++)
    {
        *x = 0;
        __faststorefence();
    }
    elapsed = __rdtsc() - start;

    std::cout << "__faststorefence: " << elapsed << std::endl
              << average          : " << double(elapsed) / double(loops) << std::end;
}

结果:

  • _mm_sfence 平均:5.7
  • __faststorefence 平均:18.37

__faststorefence 生成lock or DWORD PTR [rsp], ebp,其中 ebp 已异或为零,并且 _mm_sfence 生成sfence(不出所料)

__faststorefence的MSDN 文档明确指出它比_mm_sfence我的测试错误或错误更快。有任何想法吗?

4

2 回答 2

1

我使用提供的基准测试的AMD处理器显示 __faststorefence 是赢家。

Intel - _mm_sfence: 8.61, __faststorefence: 21.60
AMD 1 - _mm_sfence: 138.21, __faststorefence: 90.96
AMD 2 - _mm_sfence: 55.21, __faststorefence: 20.08

这是与 VS 2013.
_mm_sfence = sfence
__faststorefence = lock 或 dword ptr [rsp],esi

于 2014-06-07T15:59:15.577 回答
0

您无法比较 __fasstorefence(全围栏)与 _mm_sfence(商店围栏)。

您需要比较 __fasstorefence(全围栏)与 _mm_mfence(m - 全围栏)。

于 2015-03-20T22:55:31.250 回答