1

在性能关键的 C++ 函数中,我想使用 SSE 内在函数来处理一些值。这个函数有一个整数模板参数N,可以取值 1 到 4,它给出了我需要的 XMM 寄存器的数量。

我可以把这个函数写四次,我就能解决问题;但是,该功能非常大,我想通过避免代码重复来使其易于维护。我需要的是类似

__m128d x[N];

即我想要有N不同的__m128d变量,就像我实例化它们的堆栈数组一样,但是上面的代码不起作用,因为它创建了一个双精度堆栈数组并将其“映射”到一些 XMM 寄存器。

换句话说,我想做一个像这样的循环:

for (int i = 0; i < N; ++i) {
    k = _mm_add_pd(x[i], k);
}

(这只是一个例子,实际代码要复杂得多),我使用的编译器对它进行了很好的优化,但表达式x[i]并不是我想要的:生成的代码从内存中读取它,而我希望在 XMM 寄存器中具有持久值,而无需从/向主存储器读取/写入。

有任何想法吗?谢谢。

4

1 回答 1

1

The usual and obvious way to do it is to use preprocessor. You probably know about ## thing:

#define inc(n) x##n++

inc(1)
inc(2)

Another possibility are templates and inline functions. C++ compilers usually are very smart about inlining as much as possible, they even can put into registers fields of large structure. The only thing they don't like are those arrays, probably because their contents may be overwritten via aliased pointers. You may try to disable pointer aliasing, but i doubt it will help

于 2014-02-12T11:07:53.687 回答