I have a 128 bit variable filled with 4 separate integers. [1,2,3,4]. I want to shift right, so I can get [2,3,4,0]. What's the fastest way to do this.
My current code:
__m128 v1;
v1 = (__m128)_mm_srli_si128( _mm_castps_si128(v1) , 4 );
this succeeds in shifting the bits, but I am trying to go for speed and cache optimization (aka fewest variables as possible). Is there anyway to improve this code to avoid casting to and from a __m128i?
thanks