1

I was profiling the pre vs post increment operator in C (out of curiosity, not for micro-optimization purposes!), and I got some surprising results. I expected the post increment operator to be slower, but all of my tests show that it's (non-trivially) faster. I've even looked at the assembly code generated by using the -S flag in gcc, and post has one extra instruction (as expected). Can anyone explain this? I'm using gcc 4.8.1 on Arch Linux.

Here is my code.

EDIT: There is an integer overflow error in my code. It doesn't effect the question, but you should note the actual number of iterations is different from the argument passed in.

Post.c:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv) {
    if (argc < 2) {
        printf("Missing required argument\n");
        exit(1);
    }
    int iterations = atoi(argv[1]);

    int x = 1;
    int y;

    int i;
    for (i = 0; i < iterations; i++) {
        y = x++;
    }

    printf("%d\n", y);
}

Pre.c:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv) {
    if (argc < 2) {
        printf("Missing required argument\n");
        exit(1);
    }
    int iterations = atoi(argv[1]);

    int x = 1;
    int y;

    int i;
    for (i = 0; i < iterations; i++) {
        y = ++x;
    }

    printf("%d\n", y);
}

Here are the results:

$ gcc post.c -o post
$ gcc pre.c -o pre
$ I=100000000000; time ./post $I; time ./pre $I
1215752192

real    0m2.777s
user    0m2.777s
sys     0m0.000s
1215752193

real    0m3.140s
user    0m3.137s
sys     0m0.003s

I also wrote a timing script a while ago. Running that gave the same results:

$ I=100000000000; comptime "./pre $I" "./post $I"
3193 3133 3157 3143 3133 3153 3147 3150 3143 3146 3143
2743 2767 2700 2727 2700 2697 2727 2710 2680 2783 2700
Mean 1: 3149.18
Mean 2: 2721.27
SD 1: 6.1800
SD 2: 21.2700

The output is mostly self explanatory, but the idea is that it runs both programs 10 times (by default) and calculates the mean and standard deviation of the results in milliseconds.

I've included the assembly code generated by gcc for post.c and pre.c on pastebin, since I didn't want to clutter this post with things that might be unnecessary.

Sorry for bringing this debate up again, but these numbers just seem strange to me.

4

1 回答 1

0

有趣的问题。我以评论开头,但它有点长。

所以,这就是我所做的,我跑了prepost就像这样:

$ for i in $(seq 1 10); do command time -f "%U" ./pre 1000000000 2> pre.times
$ echo 'd=load("pre.times"); min(d), mean(d), max(d), std(d)' | octave -q

我得到了(我已替换为ans使其更具可读性):

min =  2.2900
avg =  2.3550
max =  2.4000
std =  0.040893

post类似地跑并得到:

min =  2.1900
avg =  2.2590
max =  2.3800
std =  0.055668

简而言之,我发现 Ubuntu 12.04 amd64 上的 Ivy Bridge CPU 和 gcc 4.6.3 的差异要小得多。

反汇编只显示了对指令的轻微重新排序,没有额外的指令(我想知道你为什么期望它)。

我最好的猜测是,这是一个非常微妙的 CPU 问题,可能是一个 µop 在预留站中多花一个周期等待操作数转发。这在我的设置中并不是很明显,这是肯定的。

于 2013-08-30T02:46:07.857 回答