我试图弄清楚如何最好地预先计算一些正弦和余弦值,将它们存储在对齐的块中,然后稍后将它们用于 SSE 计算:
在我的程序开始时,我创建了一个包含成员的对象:
static __m128 *m_sincos;
然后我在构造函数中初始化该成员:
m_sincos = (__m128*) _aligned_malloc(Bins*sizeof(__m128), 16);
for (int t=0; t<Bins; t++)
m_sincos[t] = _mm_set_ps(cos(t), sin(t), sin(t), cos(t));
当我去使用m_sincos时,我遇到了三个问题:
-数据似乎没有对齐
movaps xmm0, m_sincos[t] //crashes
movups xmm0, m_sincos[t] //does not crash
- 变量似乎不正确
movaps result, xmm0 // returns values that are not what is in m_sincos[t]
//Although, putting a watch on m_sincos[t] displays the correct values
- 真正让我困惑的是,这使一切正常(但太慢了):
__m128 _sincos = m_sincos[t];
movaps xmm0, _sincos
movaps result, xmm0