SSE2 直接支持一些 64 位整数运算:
将两个元素都设置为 0:
__m128i z = _mm_setzero_si128();
将两个元素都设置为 1:
__m128i z = _mm_set1_epi64x(1); // also works for variables.
__m128i z = _mm_set_epi64x(hi, lo); // elements can be different
__m128i z = _mm_set_epi32(0,1,0,1); // if any compilers refuse int64_t in 32-bit mode. (None of the major ones do.)
设置/加载低 64 位,零扩展至 __m128i
// supported even in 32-bit mode, and listed as an intrinsic for MOVQ
// so it should be atomic on aligned integers.
_mm_loadl_epi64((const __m128i*)p); // movq or movsd 64-bit load
_mm_cvtsi64x_si128(a); // only ICC, others refuse in 32-bit mode
_mm_loadl_epi64((const __m128i*)&a); // portable for a value instead of pointer
基于的东西_mm_set_epi32
可能会被某些编译器编译成一团糟,因此_mm_loadl_epi64
似乎是跨 MSVC 和 ICC 以及 gcc/clang 的最佳选择,实际上应该是安全的,可以满足您在 32 位模式下对原子 64 位加载的要求. 在 Godbolt 编译器资源管理器上查看
垂直加/减每个 64 位整数:
__m128i z = _mm_add_epi64(x,y)
__m128i z = _mm_sub_epi64(x,y)
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_integer_arithmetic.htm#intref_sse2_integer_arithmetic
左移:
__m128i z = _mm_slli_epi64(x,i) // i must be an immediate
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_int_shift.htm
位运算符:
__m128i z = _mm_and_si128(x,y)
__m128i z = _mm_or_si128(x,y)
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_integer_logical.htm
SSE 没有增量,因此您必须使用带有1
.
pcmpeqq
比较更难,因为在 SSE4.1和 SSE4.2之前没有 64 位支持pcmpgtq
这是平等的:
__m128i t = _mm_cmpeq_epi32(a,b);
__m128i z = _mm_and_si128(t,_mm_shuffle_epi32(t,177));
这会将每个 64 位元素设置为0xffffffffffff
(也就是-1)
如果它们相等。如果你想要它作为 a0
或1
in an ,你可以使用和 addint
将其拉出。(但有时你可以这样做而不是转换和添加。)_mm_cvtsi32_si128()
1
total -= cmp_result;
小于:(未完全测试)
a = _mm_xor_si128(a,_mm_set1_epi32(0x80000000));
b = _mm_xor_si128(b,_mm_set1_epi32(0x80000000));
__m128i t = _mm_cmplt_epi32(a,b);
__m128i u = _mm_cmpgt_epi32(a,b);
__m128i z = _mm_or_si128(t,_mm_shuffle_epi32(t,177));
z = _mm_andnot_si128(_mm_shuffle_epi32(u,245),z);
0xffffffffffff
如果 in 中的对应元素a
小于 ,这会将每个 64 位元素设置为b
。
这是返回布尔值的“等于”和“小于”的版本。它们返回底部 64 位整数的比较结果。
inline bool equals(__m128i a,__m128i b){
__m128i t = _mm_cmpeq_epi32(a,b);
__m128i z = _mm_and_si128(t,_mm_shuffle_epi32(t,177));
return _mm_cvtsi128_si32(z) & 1;
}
inline bool lessthan(__m128i a,__m128i b){
a = _mm_xor_si128(a,_mm_set1_epi32(0x80000000));
b = _mm_xor_si128(b,_mm_set1_epi32(0x80000000));
__m128i t = _mm_cmplt_epi32(a,b);
__m128i u = _mm_cmpgt_epi32(a,b);
__m128i z = _mm_or_si128(t,_mm_shuffle_epi32(t,177));
z = _mm_andnot_si128(_mm_shuffle_epi32(u,245),z);
return _mm_cvtsi128_si32(z) & 1;
}