许多 SSE “mov”指令指定它们正在移动浮点值。例如:
- MOVHLPS—将压缩单精度浮点值从高移到低
- MOVSD—移动标量双精度浮点值
- MOVUPD—移动未对齐的压缩双精度浮点值
为什么这些指令不简单地说它们移动 32 位或 64 位值?如果它们只是移动位,为什么指令指定它们用于浮点值?无论您是否将这些位解释为浮点,它们肯定会起作用吗?
我想我找到了答案:一些微架构在与整数指令不同的执行单元上执行浮点指令。当指令流保持在同一个“域”(整数或浮点)内时,您会获得更好的整体延迟。这在 Agner Fog 的优化手册中进行了非常详细的介绍,在标题为“数据绕过延迟”的部分:http ://www.agner.org/optimize/microarchitecture.pdf
我在这个类似的 SO 问题中找到了这个解释:Difference between MOVDQA and MOVAPS x86 instructions?
In case anyone cares, this is exactly why in Agner Fog's vectorclass he has seperate vector classes to use with boolean float (Vec4fb) and boolean integer (Vec4i) http://www.agner.org/optimize/#vectorclass
In his manual he writes. "The reason why we have defined a separate Boolean vector class for use with floating point vectors is that it enables us to produce faster code. (Many modern CPU's have separate execution units for integer vectors and floating point vectors. It is sometimes possible to do the Boolean operations in the floating point unit and thereby avoid the delay from moving data between the two units)."
Most questions about SSE and AVX can be answered by reading his manual and more importantly looking at the code in his vectorclass.