1

Is there some advantage to knowing at compile time how many iterations should be done of a for loop?

For example, in some situations, can the compiler produce an executable that will run faster given this:

#define ITERATIONS 10
int foo()
{
    for (int i=0; i < ITERATIONS; i++){
        do_something();
    }
}

than given this:

int foo(int iterations)
{
    for (int i=0; i < iterations; i++){
        do_something();
    }
}

If this is not universally the case, what are those situations?

My concern is in the specific case of OpenCL, so I'm also interested to know if this is different to C.

4

5 回答 5

7

I tested in a fairly realistic situation using GCC. When the number of loops is known at compile time, I get this:

.L2:
    call    do_something
    subl    $1, %ebx
    jne .L2

When it's not, I get this:

.L6:
    call    do_something
    addl    $1, %ebx
    cmpl    %ebp, %ebx
    jne .L6

So it was able to optimize the fixed number of iterations slightly better by changing to a count down to zero loop rather than a count up loop. If nothing else, this uses a bit less code cache.

With higher optimizations levels, it fully unrolls a loop that invokes an external function ten times. Presumably, it wouldn't do that unless it thought it was better. And it surely can't do that if the number of iterations is unknown.

Short answer: Fixed numbers of iterations give compilers more options. That should result in very slightly better code at least some of the time.

于 2012-10-18T12:15:12.040 回答
4

Indeed, it depends on the compiler. But effectively it allows the unrolling of loops. Here is an Intel example and a AMD example.

With NVIDIA you need to had in your kernel: #pragma OPENCL EXTENSION cl_nv_pragma_unroll : enable.

So you can use a flag when you compiling, like: -DITERATIONS=10

于 2012-10-18T12:21:37.560 回答
3

You need to initialise i,

for (int i; i < ITERATIONS; i++){

is undefined behaviour and allows the compiler to entirely skip the loop ;)

Apart from that, if the number of iterations is known at compile time, the compiler can completely unroll the loop, for example, which can make a big difference (if the loop body is cheap).

于 2012-10-18T12:07:41.353 回答
1

This will depend a lot on the compiler and optimization settings you are using. Most likely, knowing the amount of loops will allow the compiler to unroll the loop and get rid of the branches and parameter i.

This is however completely compiler dependent, so don't count on it. But if you have a modern compiler, chances are big it will improve speed.

edit:

I read over the obvious uninitialized i, but you probably did too...

于 2012-10-18T12:09:07.250 回答
1

In the case of gcc-4.5.3, with do_something() {printf("hello world")}:

  • Without optimization or at -O2, the compiler does not unroll either loop.
  • At -O4, for the pre-known number of iterations only, the compiler unrolls the loop and inlines do_something():

.

foo:
...
.L4:
        addl    $1, %ebx
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        cmpl    %ebx, %esi
        jg      .L4

main:
...
        pushl   %ebp
        movl    %esp, %ebp
        andl    $-16, %esp
        subl    $16, %esp
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
...
于 2012-10-18T12:21:45.457 回答