8

不久前,我在某处读到 SSE 内在函数编译成高效的机器代码,因为编译器对它们的处理与普通函数不同。我正在徘徊编译器实际上是如何做的,以及 C 程序员可以做些什么来促进这个过程。是否有关于如何以使编译器更轻松地生成高效机器代码的方式使用内部函数的指南。

谢谢。

4

2 回答 2

7

Contrary to what Necrolis wrote, the intrinsics may or may not compile down to the instructions they represent. This is especially true for copy or load instructions such as _mm_load_pd, since the compiler is still responsible for register allocation and assignment when using intrinsics. This means that copying a value from one location to another may not be necessary at all, if the two locations can be represented by the same register. In that case the compiler may choose to remove the copy. It may also choose to remove other instructions if the result is never used.

Check out this blog post where the behavior of different compilers is compared in practice. It's from 2009, so the details may no longer apply. However, newer compilers are likely to optimize your code more, not less.

As for actually use intrinsics efficiently, the answer is the same as for all other performance optimization: Measure, measure and measure. Make sure that you are actually dealing with a hot piece of code, find out why it's slow and then improve it. You are very likely to find that improving your memory access patterns is more important than using intrinsics.

于 2013-05-09T14:44:38.203 回答
6

内在函数编译为表示的指令,这是否有效取决于它们的使用方式。

此外,每个编译器对内在函数的处理略有不同(也就是特定于实现),但GCC是开源的,因此您可以看到它们如何处理 SSE 的,Open Watcom*、LCC、PCC 和TCC*都是开源 C 编译器,尽管他们没有 SSE 内在函数,但它们仍然应该具有内在函数,您可以看到它们是如何处理它们的。

我认为您阅读的内容与代码的自动矢量化有关,这是 GCC(请参阅)和 ICC 非常擅长的,但它们不如手动优化的代码好,至少目前还没有

*可能已经更新了对 SSE 的支持,最近没有检查...

于 2011-04-15T15:59:36.820 回答