architecture - 什么算是翻牌？

Question

假设我有一个伪 C 程序：

For i=0 to 10
    x++
    a=2+x*5
next

对于 30 FLOPS，这个 (1 [x++] + 1 [x*5] + 1 [2+(x+5))] * 10[loop] 的 FLOP 数是多少？我很难理解什么是翻牌。

请注意，[...] 表示我从哪里获得“操作”的计数。

score 10 · Accepted Answer

出于 FLOPS 测量的目的，通常只包括加法和乘法。除法，倒数，平方根和超越函数之类的东西太昂贵而不能包含在单个操作中，而诸如加载和存储之类的东西太微不足道了。

换句话说，您的循环体包含 2 个加法和 1 个乘法，因此（假设x是浮点数）每个循环迭代是 3 个操作；如果你运行循环 10 次，你已经完成了 30 次操作。

请注意，在测量 MIPS 时，您的循环将超过 3 条指令，因为它还包括 FLOPS 测量不计算在内的加载和存储。

score 7 · Accepted Answer

FLOPS代表每秒浮动操作。如果您正在处理整数，那么您的代码中没有任何浮点运算。

score 3 · Accepted Answer

海报清楚地表明，FLOPS（在此处详述）与每秒的浮点（而不是整数）操作有关，因此您不仅要计算正在执行的操作次数，还要计算在多长时间内执行的操作。

如果 "x" 和 "a" 是浮点数，那么您正在尝试计算代码中的操作数，但您必须检查目标代码以确保实际使用了多少浮点指令. 例如，如果随后不使用“a”，则优化编译器可能不会费心计算它。

此外，某些浮点运算（例如加法）可能比其他运算（例如乘法）要快得多，因此在同一台机器上，仅浮点加法的循环可能比仅浮点乘法的循环运行更多的 FLOPS。

score 3 · Accepted Answer

FLOPs（小写的 s 表示 FLOP 的复数，根据 Martinho Fernandes 的评论）是指机器语言浮点指令，所以它取决于你的代码编译成多少指令。

首先，如果所有这些变量都是整数，那么这段代码中就没有 FLOP。但是，让我们假设您的语言将所有这些常量和变量识别为单精度浮点变量（使用单精度可以更轻松地加载常量）。

此代码可以编译为（在 MIPS 上）：

Assignment of variables: x is in $f1, a is in $f2, i is in $f3.
All other floating point registers are compiler-generated temporaries.
$f4 stores the loop exit condition of 10.0
$f5 stores the floating point constant 1.0
$f6 stores the floating point constant 2.0
$t1 is an integer register used for loading constants
    into the floating point coprocessor.

     lui $t1, *upper half of 0.0*
     ori $t1, $t1,  *lower half of 0.0*
     lwc1 $f3, $t1
     lui $t1, *upper half of 10.0*
     ori $t1, $t1,  *lower half of 10.0*
     lwc1 $f4, $t1
     lui $t1, *upper half of 1.0*
     ori $t1, $t1,  *lower half of 1.0*
     lwc1 $f5, $t1
     lui $t1, *upper half of 2.0*
     ori $t1, $t1,  *lower half of 2.0*
     lwc1 $f6, $t1
st:  c.gt.s $f3, $f4
     bc1t end
     add.s $f1, $f1, $f5
     lui $t1, *upper half of 5.0*
     ori $t1, $t1,  *lower half of 5.0*         
     lwc1 $f2, $t1
     mul.s $f2, $f2, $f1
     add.s $f2, $f2, $f6
     add.s $f3, $f3, $f5
     j st
end: # first statement after the loop

所以根据 Gabe 的定义，循环内有 4 个 FLOP（3xadd.s和 1x mul.s）。如果您还计算循环比较，则有 5 个 FLOP c.gt.s。将其乘以 10，程序总共使用了 40（或 50）个 FLOP。

更好的优化编译器可能会识别出a循环内部没有使用的值，因此它只需要计算的最终值a。它可以生成看起来像的代码

     lui $t1, *upper half of 0.0*
     ori $t1, $t1,  *lower half of 0.0*
     lwc1 $f3, $t1
     lui $t1, *upper half of 10.0*
     ori $t1, $t1,  *lower half of 10.0*
     lwc1 $f4, $t1
     lui $t1, *upper half of 1.0*
     ori $t1, $t1,  *lower half of 1.0*
     lwc1 $f5, $t1
     lui $t1, *upper half of 2.0*
     ori $t1, $t1,  *lower half of 2.0*
     lwc1 $f6, $t1
st:  c.gt.s $f3, $f4
     bc1t end
     add.s $f1, $f1, $f5
     add.s $f3, $f3, $f5
     j st
end: lui $t1, *upper half of 5.0*
     ori $t1, $t1,  *lower half of 5.0*         
     lwc1 $f2, $t1
     mul.s $f2, $f2, $f1
     add.s $f2, $f2, $f6

在这种情况下，循环内有 2 次加法和 1 次比较（乘以 10 可以得到 20 或 30 次 FLOP），外加 1 次乘法和 1 次加法。因此，您的程序现在需要 22 或 32 次 FLOP，具体取决于我们是否计算比较。

score 1 · Accepted Answer

x 是整数还是浮点变量？如果它是一个整数，那么您的循环可能不包含任何触发器。

architecture - 什么算是翻牌？

5 回答 5

Related

Reference