c - For LOOP 被编译器优化丢弃

Question

我正在做一个实验来测量微控制器上“for循环”的执行时间。这个''for循环''包含一些整数和指针操作。

案例 1：当我将编译器优化标志设置为“无”（无优化）时，会生成汇编代码，我可以测量执行时间。

案例 2：当我将编译器优化设置为“速度”（针对速度进行优化）时，没有为此循环生成汇编代码。看起来，编译器抛出了这个''for循环''

/* the basic concept behind this code is data manipulation in an array.Therefore I created an array then with the help of loops, tried to manipulate data*/

   int abc[1000];
    for(n=0; n<1000; n++)
             {
                 abc[n]= 0xaa;                         
             }
    for(n=2; n<1000; n=n+2)
             {
                 abc[n]= 0xbb;                         
             }
    for(n=5; n<1000; n=n+2)
             {
    for(i=(n+n); i<1000; i++)
                {
                   abc[i]= i;                         
                }
             }

当我将编译器标志设置为速度时，任何人都可以解释为什么编译器会抛出这个循环。

score 1 · Accepted Answer

The compiler looks at your code and sees that abc is set and never used. Some compilers give you a warning about this. Since abc is never used the compiler optimizes it out, because whats the point in setting a variable if you never use it.

You could make abc volatile, but that would probably defeat the purpose of your test. Making the variable volatile would tell the compiler it can't make any assumptions about its use. When you make the variable volatile the compiler may not be able to make any optimizations, so the timing would be the same with and without optimizations.

score 1 · Accepted Answer

如果您之后不使用abc，则优化器可能会将其（以及所有写入它的内容）识别为“已死”并完全删除它。

score 0 · Accepted Answer

这里有几个完整的示例，因为您的示例不完整，以至于我们无法准确回答您的问题。

unsigned int fun ( void )
{
    unsigned int x[8];
    unsigned int ra;
    unsigned int rb;

    for(ra=0;ra<5;ra++) x[ra]=ra;
    rb=0; for(ra=0;ra<5;ra++) rb+=x[ra];
    return(rb);
}

void more_fun ( void )
{
    unsigned int x[8];
    unsigned int ra;

    for(ra=0;ra<5;ra++) x[ra]=ra;
}

和一个优化编译输出的例子

00000000 <fun>:
   0:   e3a0000a    mov r0, #10
   4:   e12fff1e    bx  lr

00000008 <more_fun>:
   8:   e12fff1e    bx  lr

首先是第二个函数，很容易看出 ra 和 x 都没有在函数之外使用，它们所做的任何事情都不会产生任何真正有价值的东西，它是带有未使用变量的死代码，所以一切都消失了，整个功能对此进行了优化：

void more_fun ( void )
{
}

应该是这样。

更进一步，我使用随机发生器和其他算法更进一步，有时编译器会计算出来，有时不会。在这种情况下，这很容易。

所以 fun() 函数中的任何旋转都没有任何运行时值，它都是死代码，结果不会根据输入或全局的东西而变化，它是完全包含在函数中的数学运算，并且可以得到准确的答案预先计算。所以编译器在编译时计算答案 (0+1+2+3+4 = 10) 并删除所有死代码。基本上想出了正确的答案

unsigned int fun ( void )
{
   return(10);
}

如果您想使用循环来消耗时间，或者甚至想看看循环是如何实现的，等等，那么您可以尝试一些事情。

void dummy ( unsigned int );
unsigned int fun ( void )
{
    unsigned int ra;
    volatile unsigned int rb;

    rb=0; for(ra=0;ra<5;ra++) rb+=ra;
    return(rb);
}

void more_fun ( void )
{
    unsigned int ra;
    for(ra=0;ra<5;ra++) dummy(ra);
}

这可以给出类似的东西（编译器不同）

00000000 <fun>:
   0:   e3a03000    mov r3, #0
   4:   e24dd008    sub sp, sp, #8
   8:   e58d3004    str r3, [sp, #4]
   c:   e59d3004    ldr r3, [sp, #4]
  10:   e58d3004    str r3, [sp, #4]
  14:   e59d3004    ldr r3, [sp, #4]
  18:   e2833001    add r3, r3, #1
  1c:   e58d3004    str r3, [sp, #4]
  20:   e59d3004    ldr r3, [sp, #4]
  24:   e2833002    add r3, r3, #2
  28:   e58d3004    str r3, [sp, #4]
  2c:   e59d3004    ldr r3, [sp, #4]
  30:   e2833003    add r3, r3, #3
  34:   e58d3004    str r3, [sp, #4]
  38:   e59d3004    ldr r3, [sp, #4]
  3c:   e2833004    add r3, r3, #4
  40:   e58d3004    str r3, [sp, #4]
  44:   e59d0004    ldr r0, [sp, #4]
  48:   e28dd008    add sp, sp, #8
  4c:   e12fff1e    bx  lr

00000050 <more_fun>:
  50:   e92d4010    push    {r4, lr}
  54:   e3a04000    mov r4, #0
  58:   e1a00004    mov r0, r4
  5c:   e2844001    add r4, r4, #1
  60:   ebfffffe    bl  0 <dummy>
  64:   e3540005    cmp r4, #5
  68:   1afffffa    bne 58 <more_fun+0x8>
  6c:   e8bd4010    pop {r4, lr}
  70:   e12fff1e    bx  lr

如您所见，易失性解决方案非常丑陋，它是说我希望您每次使用此变量时都真正走出去触摸 ram，当您想对它做任何事情时从 ram 中读取它，并在每一步后将其写回。more_fun() 解决方案并不依赖于希望编译器像您希望的那样尊重 volatile（为什么编译器在本地、死、变量上尊重 volatile，似乎是错误的），而是如果您强制编译器调用外部函数（一个这不在优化域中，因此无法内联，并且如果例如 dummy() 不使用输入变量，则可能显示死代码）。

void more_fun ( void )
{   
   dummy(0);
   dummy(1);
   dummy(2);
   dummy(3);
   dummy(4);
}

所有这一切的美妙之处在于，使用像 gnu 这样的免费工具，尽管就最紧凑/最快的代码而言不是最好的编译器（一种尺寸适合所有不适合任何人），它确实可以编译为对象或二进制文件，并且具有反汇编程序这将反汇编对象和二进制文件，因此您可以使用这些简单的函数并检查编译选项的作用并开始了解死代码是什么或看起来像什么。除了时间，不会花费你任何东西。

大多数对此有所了解的人都选择了 volatile 解决方案，如果您尝试进行一些手动优化并开始缓慢并建立起来，那么 volatile 会弄乱您的结果对您没有帮助，它会以不自然的方式使处理器过度工作，而不是你在那里没有它，你最终可以想出真正的代码来调用循环中的其他函数，其性能在有和没有变量之一的情况下会有很大差异。

无论如何，一个词的基准测试都是虚假的，即使在具有相同编译器的同一台计算机上使用相同的源代码，也很容易操纵结果。重要的是您执行实际任务的实际程序具有这种性能。以不同的方式实现它，并在可能的情况下进行衡量，并增加一些边距，然后决定哪个更快并使用它，或者如果大致相同，则更易于阅读和/或维护。

c - For LOOP 被编译器优化丢弃

3 回答 3

Related

Reference