c - GCC 中用于除法的 SIMD (SSE) 指令

Question

如果可能，我想使用 SSE 指令优化以下代码段：

/*
 * the data structure
 */
typedef struct v3d v3d;
struct v3d {
    double x;
    double y;
    double z;
} tmp = { 1.0, 2.0, 3.0 };

/*
 * the part that should be "optimized"
 */
tmp.x /= 4.0;
tmp.y /= 4.0;
tmp.z /= 4.0;

这可能吗？

score 1 · Accepted Answer

您正在寻找的内在是_mm_div_pd. 这是一个工作示例，足以引导您朝着正确的方向前进：

#include <stdio.h>

#include <emmintrin.h>

typedef struct
{
    double x;
    double y;
    double z;
} v3d;

typedef union __attribute__ ((aligned(16)))
{
    v3d a;
    __m128d v[2];
} u3d;

int main(void)
{
    const __m128d vd = _mm_set1_pd(4.0);
    u3d u = { { 1.0, 2.0, 3.0 } };

    printf("v (before) = { %g %g %g }\n", u.a.x, u.a.y, u.a.z);

    u.v[0] = _mm_div_pd(u.v[0], vd);
    u.v[1] = _mm_div_pd(u.v[1], vd);

    printf("v (after) = { %g %g %g }\n", u.a.x, u.a.y, u.a.z);

    return 0;
}

score 1 · Accepted Answer

我在 windows 下使用过 SIMD 扩展，但在 linux 下还没有。话虽如此，您应该能够利用DIVPSSSE 操作，它将一个 4 浮点向量除以另一个 4 浮点向量。但是您使用的是双打，所以您需要 SSE2 版本DIVPD。我差点忘记了，确保用-msse2switch 构建。

我找到了一个页面，其中详细介绍了一些 SSE GCC 内置函数。它看起来有点旧，但应该是一个好的开始。

http://ds9a.nl/gcc-simd/

score 1 · Accepted Answer

够tmp.x *= 0.25;了吗？

请注意，对于 SSE 说明（如果您想使用它们），重要的是：

1) 所有的内存访问都是 16 字节

2) 操作循环执行

3) 不执行 int <-> float 或 float <-> double 转换

4）尽可能避免分裂

c - GCC 中用于除法的 SIMD (SSE) 指令

3 回答 3

Related

Reference