您可以直接使用您的内存值。例如:
__m128i *p=static_cast<__m128i *>(_aligned_malloc(8*4,16));
for(int i=0;i<32;++i)
reinterpret_cast<unsigned char *>(p)[i]=static_cast<unsigned char>(i);
__m128i xyz=_mm_unpackhi_epi8(p[0],p[1]);
结果中有趣的部分:
; __m128i xyz=_mm_unpackhi_epi8(p[0],p[1]);
0040BC1B 66 0F 6F 00 movdqa xmm0,xmmword ptr [eax]
0040BC1F 66 0F 6F 48 10 movdqa xmm1,xmmword ptr [eax+10h]
0040BC24 66 0F 68 C1 punpckhbw xmm0,xmm1
0040BC28 66 0F 7F 04 24 movdqa xmmword ptr [esp],xmm0
So the compiler is doing a bit of a poor job -- or perhaps this way is faster and/or playing with the options would fix that -- but it generates code that works, and the C++ code is stating what it wants fairly directly.