I am trying to see how unrolling is done in GCC. I have written a C code to add elements of an array to do this.

for (i=0;i<16384;i++)
  c[i] = a[i]+b[i];

I have compiled it with -o2 flag and -funroll-all-loops.

gcc -o2 -funroll-all-loops --save-temps pleaseUnrollTheLoops.c

The object file for the above program has the following assembly code.

    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $196504, %rsp
    movl    $0, -196612(%rbp)
    jmp .L2
    movl    -196612(%rbp), %eax
    movl    -196608(%rbp,%rax,4), %edx
    movl    -196612(%rbp), %eax
    movl    -131072(%rbp,%rax,4), %eax
    addl    %eax, %edx
    movl    -196612(%rbp), %eax
    movl    %edx, -65536(%rbp,%rax,4)
    addl    $1, -196612(%rbp)
    cmpl    $16383, -196612(%rbp)
    jle .L3
    .cfi_def_cfa 7, 8

In each iteration it is a doing only one addition (7th line in the L3 section) and incrementing the content of rbp register by 1 (as in the last line of L3 section). This indicates that compiler is not unrolling the loop. I was expecting more additions to happen in one loop. My question is, why it is not unrolling the loop even after using the funroll flag?. Is there a possibility that compiler is not optimizing because it thinks that unrolling is not useful in this case ?. If that is true, then what should I do in order to make the compiler unroll the loops ?.


