c++ - Optimizing for conveyor execution

Question

In Conveyor architecture before execution instructions, they are broken down into smaller ones. So they run much faster. But before instruction executed the whole, is not possible to execute the following instruction addressing to the same registers. Is that right, to optimize the order of the instructions that would appeal to the same register(or RAM cell), to these instructions were located as far away from each other as possible? Or no sense in this, because the compiler it self optimizes this way?

For example:

int a = 1, b = 2, c = 3;
a *= a;
b *= a;  // stop and waiting for the end of calculating (a)
c *= c;

Optimized:

int a = 1, b = 2, c = 3;
a *= a;
c *= c;  // calculating (a), but we don't need this and don't stop
b *= a;

score 1 · Accepted Answer

It obviously depends on your compiler and architecture. Modern X86 processors support out of order execution, which means the processor doesn't actually need to execute the instructions in order. Instead it will read a few instructions ahead (actually it isn't even that few) and reorder them for better performance before execution. This means that this optimization is really not necessary for out of order cpus, since the actual execution order isn't dependent on the order of the instructions in the code.

For in order architectures (e.g. Cell) the order of instructions matters. However a properly optimizing compiler is quite likely to be able to do this reordering it self in many cases (that is as long as it can prove, that this won't change the behaviour of the code). The main scenarios where it will likely fail to do so is, if pointers (or volatile variables) are involved, since in most cases the compiler can't prove that different pointers don't point to the same variable. Things like __restrict can help in that case.

Another point to consider is, that in many cases the latency of things like an integer multiplication will not really have an effect on the runtime, since for many programs the performance is more limited by memory access. In cases where it does make a difference, it might be more useful to think about using simd and/or multithreading to optimize the code, then to think about instruction placement.

In conclusion I would say that this kind of optimization isn't really useful in a compiled language (When writing assembly the situation can be different), since both the cpu and the compiler might change the order anyways and it may not even make a difference. That doesn't mean that there aren't situations where this kind of optimization is useful, but that is really only in the most critical code paths, when it has been proven that the compiler/cpu isn't up to the task.

c++ - Optimizing for conveyor execution

1 回答 1

Related

Reference