2

I have a 128 bit variable filled with 4 separate integers. [1,2,3,4]. I want to shift right, so I can get [2,3,4,0]. What's the fastest way to do this.

My current code:

__m128 v1;
v1 = (__m128)_mm_srli_si128(  _mm_castps_si128(v1) , 4 );

this succeeds in shifting the bits, but I am trying to go for speed and cache optimization (aka fewest variables as possible). Is there anyway to improve this code to avoid casting to and from a __m128i?

thanks

4

1 回答 1

1

别担心。__m128并且__m128i是解释 XMM 寄存器内容的两种不同方式,因此转换在编译中消失了。我的编译器(Mac OS 10.9 上的 clang)将整个过程编译成一条指令:

psrldq $0x4, %xmm0
于 2013-10-26T03:48:14.713 回答