I am trying to see how unrolling is done in GCC. I have written a C code to add elements of an array to do this.
for (i=0;i<16384;i++)
c[i] = a[i]+b[i];
I have compiled it with -o2 flag and -funroll-all-loops.
gcc -o2 -funroll-all-loops --save-temps pleaseUnrollTheLoops.c
The object file for the above program has the following assembly code.
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $196504, %rsp
movl $0, -196612(%rbp)
jmp .L2
.L3:
movl -196612(%rbp), %eax
cltq
movl -196608(%rbp,%rax,4), %edx
movl -196612(%rbp), %eax
cltq
movl -131072(%rbp,%rax,4), %eax
addl %eax, %edx
movl -196612(%rbp), %eax
cltq
movl %edx, -65536(%rbp,%rax,4)
addl $1, -196612(%rbp)
.L2:
cmpl $16383, -196612(%rbp)
jle .L3
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
In each iteration it is a doing only one addition (7th line in the L3 section) and incrementing the content of rbp register by 1 (as in the last line of L3 section). This indicates that compiler is not unrolling the loop. I was expecting more additions to happen in one loop. My question is, why it is not unrolling the loop even after using the funroll flag?. Is there a possibility that compiler is not optimizing because it thinks that unrolling is not useful in this case ?. If that is true, then what should I do in order to make the compiler unroll the loops ?.