根据 MSDN 文档,__faststorefence
比_mm_sfence
. 在我的时间里,它慢了三倍多。
平台:Win7-64,带有 x64 SDK 的 Visual Studio 2010。
#include <windows.h>
#include <xmmintrin.h>
#include <intrin.h>
int main(int argc, char* argv[])
{
int* x = new int;
__int64 loops = 1000000000; // 1 billion
__int64 start, elapsed;
start = __rdtsc();
for (__int64 i = 0; i < loops; i++)
{
*x = 0;
_mm_sfence();
}
elapsed = __rdtsc() - start;
std::cout << "_mm_sfence: " << elapsed << std::endl
<< "average : " << double(elapsed) / double(loops) << std::endl;
start = __rdtsc();
for(__int64 i = 0; i < loops; i++)
{
*x = 0;
__faststorefence();
}
elapsed = __rdtsc() - start;
std::cout << "__faststorefence: " << elapsed << std::endl
<< average : " << double(elapsed) / double(loops) << std::end;
}
结果:
- _mm_sfence 平均:5.7
- __faststorefence 平均:18.37
__faststorefence 生成lock or DWORD PTR [rsp], ebp
,其中 ebp 已异或为零,并且 _mm_sfence 生成sfence
(不出所料)
__faststorefence的MSDN 文档明确指出它比_mm_sfence
我的测试错误或错误更快。有任何想法吗?