我正在学习汇编并在我的 Digital Mars C++ 编译器中进行一些内联。我搜索了一些东西以使程序更好,并使用这些参数来调整程序:
use better C++ compiler//thinking of GCC or intel compiler
use assembly only in critical part of program
find better algorithm
Cache miss, cache contention.
Loop-carried dependency chain.
Instruction fetching time.
Instruction decoding time.
Instruction retirement.
Register read stalls.
Execution port throughput.
Execution unit throughput.
Suboptimal reordering and scheduling of micro-ops.
Branch misprediction.
Floating point exception.
我理解除了“寄存器读取停顿”之外的所有内容。
问题:谁能告诉我这在 CPU 和“乱序执行”的“超标量”形式中是如何发生的?正常的“乱序”似乎合乎逻辑,但我找不到“超标量”形式的合乎逻辑的解释。
问题 2:有人能否提供一些 SSE SSE2 和较新 CPU 的良好指令列表,最好包含微操作表、端口吞吐量、单位和一些计算表,以便找到一段代码的真正瓶颈?
我会很高兴有一个像这样的小例子:
//loop carried dependency chain breaking:
__asm
{
loop_begin:
....
....
sub edx,05h //rather than taking i*5 in each iteration, we sub 5 each iteration
sub ecx,01h //i-- counter
...
...
jnz loop_begin//edit: sub ecx must have been after the sub edx for jnz
}
//while sub edx makes us get rid of a multiplication also makes that independent of ecx, making independent
谢谢你。
电脑:奔腾-M 2GHz,Windows XP-32 位