2

是否有可以从(非连续)均匀分布的内存地址加载浮点数的英特尔 SSE 指令?

例如给定一个数组A = {0, 1, 2, 3 .... n},我想一次加载到一个 128 位寄存器{A[0], A[4], A[8], A[12]},然后 {A[5], A[9], A[13], A[17]}

4

1 回答 1

3

In this kind of use case you would typically load multiple contiguous vectors and then permute them into the required arrangements using e.g. pshufd or punpckldq etc.

Note that with AVX2 in Haswell and beyond there are gathered load instructions (e.g. _mm_i32gather_ps), which might also be worth considering.

于 2013-04-22T18:48:22.447 回答