8

内在函数允许将_mm_shuffle_ps()浮点输入交错为输出的低 2 个浮点数和高 2 个浮点数。

例如:

R = _mm_shuffle_ps(L1, H1, _MM_SHUFFLE(3,2,3,2))

将导致:

R[0] = L1[2];
R[1] = L1[3];
R[2] = H1[2];
R[3] = H1[3]

我想知道整数数据类型是否有类似的内在函数可用?需要两个__m128i变量和一个掩码进行交错的东西?

_mm_shuffle_epi32()内在函数只需要一个 128 位向量而不是两个。

4

1 回答 1

13

Nope, there is no integer equivalent to this. So you have to either emulate it, or cheat.

One method is to use _mm_shuffle_epi32() on A and B. Then mask out the desired terms and OR them back together.

That tends to be messy and has around 5 instructions. (Or 3 if you use the SSE4.1 blend instructions.)

Here's the SSE4.1 solution with 3 instructions:

__m128i A = _mm_set_epi32(13,12,11,10);
__m128i B = _mm_set_epi32(23,22,21,20);

A = _mm_shuffle_epi32(A,2*1 + 3*4 + 2*16 + 3*64);
B = _mm_shuffle_epi32(B,2*1 + 3*4 + 2*16 + 3*64);

__m128i C = _mm_blend_epi16(A,B,0xf0);

The method that I prefer is to actually cheat - and floating-point shuffle like this:

__m128i Ai,Bi,Ci;
__m128  Af,Bf,Cf;

Af = _mm_castsi128_ps(Ai);
Bf = _mm_castsi128_ps(Bi);
Cf = _mm_shuffle_ps(Af,Bf,_MM_SHUFFLE(3,2,3,2));
Ci = _mm_castps_si128(Cf);

What this does is to convert the datatype to floating-point so that it can use the float-shuffle. Then convert it back.

Note that these "conversions" are bitwise conversions (aka reinterpretations). No conversion is actually done and they don't map to any instructions. In the assembly, there is no distinction between an integer or a floating-point SSE register. These cast intrinsics are just to get around the type-safety imposed by C/C++.

However, be aware that this approach incurs extra latency for moving data back-and-forth between the integer and floating-point SIMD execution units. So it will be more expensive than just the shuffle instruction.

于 2012-10-31T08:18:47.667 回答