假设我们有这样的代码:
float *data = (float*)_mm_malloc(N*sizeof(float), 16);//allocate 16-byte aligned array of N elements
const int loop_bound1 = .....;//some value
const int loop_step = .....;//some value
const int loop_bound2 = ....;//some value
for(auto i=0; i<loop_bound; i+=loop_step)
{
auto inter_data1 = data + i;//inter_data1 may be not aligned
for(int j=0; j<loop_bound2; ++j)
{
auto inter_data2 = inter_data1 + j;//inter_data2 also may be not aligned
__m128 a = _mm_loadu_ps(inter_data2);//it's ok, but I want use _mm_load_ps instead
}
}
调用_mm_load_ps而不是_mm_loadu_ps需要保持inter_data1和inter_data2 16 字节对齐。对齐这些指针的最佳(安全且开销最小)方法是什么?我考虑std::align,但我不确定这是正确的选择。