2

如何使用 SSE 内在函数创建一个掩码,该掩码指示两个打包浮点数(__m128's)的符号是否相同,例如,如果比较 a 和 b 其中 a 为 [1.0 -1.0 0.0 2.0] 且 b 为 [1.0 1.0 1.0 1.0]我们得到的期望掩码是 [true false true true]。

4

2 回答 2

5

这是一个解决方案:

const __m128i MASK = _mm_set1_epi32(0xffffffff);

__m128 a = _mm_setr_ps(1,-1,0,2);
__m128 b = _mm_setr_ps(1,1,1,1);

__m128  f = _mm_xor_ps(a,b);
__m128i i = _mm_castps_si128(f);

i = _mm_srai_epi32(i,31);
i = _mm_xor_si128(i,MASK);

f = _mm_castsi128_ps(i);

//  i = (0xffffffff, 0, 0xffffffff, 0xffffffff)
//  f = (0xffffffff, 0, 0xffffffff, 0xffffffff)

在此代码段中,两者if都将具有相同的位掩码。我假设你想要它的__m128类型,所以我添加了f = _mm_castsi128_ps(i);将它从__m128i.

请注意,此代码对零的符号敏感。所以0.0-0.0影响结果。


说明:

代码的工作方式如下:

f = _mm_xor_ps(a,b);       //  xor the sign bits (well all the bits actually)

i = _mm_castps_si128(f);   //  Convert it to an integer. There's no instruction here.

i = _mm_srai_epi32(i,31);  //  Arithmetic shift that sign bit into all the bits.

i = _mm_xor_si128(i,MASK); //  Invert all the bits

f = _mm_castsi128_ps(i);   //  Convert back. Again, there's no instruction here.
于 2011-12-09T03:53:37.087 回答
2

Have a look at the _mm_movemask_ps instruction, which extracts the most significant bit (i.e. sign bit) from 4 floats. See http://msdn.microsoft.com/en-us/library/4490ys29.aspx

For example, if you have [1.0 -1.0 0.0 2.0], then movemask_ps will return 4, or 0100 in binary. So then if you get movemask_ps for each vector and compare the results (perhaps bitwise NOT XOR), then that will indicate whether all the signs are the same.

a = [1.0 -1.0 0.0 2.0]
b = [1.0 1.0 1.0 1.0]
movemask_ps a = 4
movemask_ps b = 0
NOT (a XOR b) = 0xB, or binary 1011

Hence signs are the same except in the second vector element.

于 2011-12-09T04:13:53.537 回答