c - 长双精度浮点错误

Question

如果用 C 语言编写了一个程序来感受浮点误差在重复除法中的大小。

#include <stdio.h>

int main (int argc, char* argv[]) {
    if (argc < 3) {
        printf("Enter a decimal number as the first positional " 
                "argument\n");
        printf("Enter the maximum number of digits to print as the " 
                "second positional argument\n");
        return 0;
    }   

    long double d;
    sscanf(argv[1], "%Lf", &d);
    int m;
    sscanf(argv[2], "%d", &m);

    int i;
    char format[10];
    for (i = 1; i <= m; ++i) {
        printf("(%d digits)\n", i); 
        sprintf(format, "%%.%dLf\n\n", i); 
        printf(format, d); 
    }   

    long double p = d;
    printf("\n");
    for (i = 1; i <= m; ++i) {
        printf("(%Lf/10e%d with %d digits)\n", d, i, m); 
        p = p/(long double)10.0;
        printf(format, p); 
    }
    return 0;
}

这是使用以下参数运行时的一行输出

$ fpe 0.1 700
.
.
.
(0.100000/10e180 with 700 digits)
0.0000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000999999999999999999969819570700939858153376
736698732853283605408116087882762948991724868957176649769045358705872354052
261113540314114885779914335315639806061208847920179776799404948795506248532
485303630811119507604985596684233990126219304092175565232198569923253737561
276484626462077772036038845251286782974821021132356946292172207615386395848
331484216638642723800290357587296443408362280895970909637712494349003491485
594533190659822910753768473307578901199121901299804449081420898437500000000
000000000000000000000000000
.
.
.

在这里，我们观察到 485 位浮点噪声。这是用 gcc 4.4.3 编译的，我假设它使用 80 位扩展精度。但是，485 个十进制数字远远超过 80 位信息。所以，我的问题是，这些信息是从哪里来的？

score 5 · Accepted Answer

没有打印额外的信息。打印的值正是的值p。

180次迭代后，p为+0x1.A8E90F9908E0CA56p-602，即15309010345804195115•2 ^-665。IEEE 754 标准将浮点数的值定义为符号（+1 或 -1）乘以 2 的整数幂（由数字的指数字段确定）乘以其有效数的值（分数部分）。所以每个浮点数都有一个特定的值。以上是价值p. 在十进制，该值是完全.9999999999999999999698195707009398581533767366987328532836054081160878827629489917248689571766497690453587058723540522611135403141148857799143353156398060612088479201797767994049487955062485324853036308111195076049855966842339901262193040921755652321985699232537375612764846264620777720360388452512867829748210211323569462921722076153863958483314842166386427238002903575872964434083622808959709096377124943490034914855945331906598229107537684733075789011991219012998044490814208984375•10 ^-181。

那就是你的程序产生的价值。因此，您的输出格式化程序已准确打印p. 它做得很好。

事实上，在所有方面，浮点都做得很好。该值是最接近 10 ^-181的 long double 值。在一个长的双打中接近是不可能的。因此，即使经过数百次算术运算，错误也没有增加。

这里没有新信息。如果我们被告知表示的位p，我们可以产生相同的数百个十进制数字。他们不会告诉你任何新的东西。但是，它们也不是垃圾。它们完全由的值决定p。

score 0 · Accepted Answer

为了向 Eric 的出色答案添加更多信息，第 181 次迭代按照您的方式计算，恰好是最接近 10^-181 的长双精度数，但这并不适用于每个 n...

例如，1/10.0/10.0/10.0/10.0 != 1/10000.0当以 long double 计算时。

在 squeak Smalltalk http://code.google.com/p/arbitrary-precision-float/中使用我自己的浮点仿真包，我可以说在前 300 个 10^-n 中，77 是最近的 long double 值，223不是。

(1 to: 300) count: [:n |
    ((1 to: n) inject: (1 asArbitraryPrecisionFloatNumBits: 64) into: [:p :i | p/10])
    ~= ((10 raisedTo: n negated) asArbitraryPrecisionFloatNumBits: 64)]

10^-218 的差异峰值为 4 ulp。

(1 to: 300) detectMax: [:n |
    (((1 to: n) inject: (1 asArbitraryPrecisionFloatNumBits: 64) into: [:p :i | p/10])
    - ((10 raisedTo: n negated) asArbitraryPrecisionFloatNumBits: 64)) abs
    / (2 raisedTo: -63+((10 raisedTo: n negated) floorLog: 2))].

以下是 ulp 方面的错误演变：

(1 to: 300) collect: [:n |
    ((((1 to: n) inject: (1 asArbitraryPrecisionFloatNumBits: 64) into: [:p :i | p/10])
    - ((10 raisedTo: n negated) asArbitraryPrecisionFloatNumBits: 64))
    / (2 raisedTo: -63+((10 raisedTo: n negated) floorLog: 2))) asInteger].

#(0  0  0 -1 -1 -1 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -2 -2
 -1 -1 -1 -1 -1 -1  0 -1  0  0  0  1  0  0  0  0  1  0  0  0
  0  0  0 -1  0  0  0  1  0  0  0  0  0  0  1  1  1  1  0  0
  0  0 -1  0 -1 -1 -1 -1 -2 -1 -1 -2 -2 -2 -3 -2 -2 -3 -2 -2
 -3 -2 -2 -2 -2 -1 -2 -1 -1 -2 -2 -2 -1 -2 -2 -1 -2 -2 -1 -2
 -2 -2 -3 -2 -1 -2 -2 -1 -2 -2 -1 -2 -2 -2 -3 -2 -2 -1 -1 -1
 -1 -1 -1 -1 -2 -2 -1 -3 -2 -2 -3 -2 -2 -3 -3 -2 -2 -2 -2 -3
 -2 -2 -3 -3 -2 -3 -2 -2 -2 -3 -2 -2 -3 -2 -1 -2 -2 -1 -2 -1
 -1 -2 -1 -1 -1  0  0  0  0  0  1  1  0  0  0  0  0  1  0  0
  0  0  0  0 -1 -1 -1 -1  0  0  0  0 -1 -1 -1 -2 -1  0 -1 -1
 -1 -1 -1 -2 -1 -1 -1 -1 -2 -2 -2 -2 -2 -2 -3 -3 -2 -4 -3 -2
 -3 -2 -2 -3 -2 -2 -2 -2 -1 -3 -2 -2 -3 -3 -2 -1 -2 -2 -1 -2
 -2 -1 -3 -2 -2 -3 -3 -2 -3 -2 -1 -1 -1  0  0  0  0  0 -1  0
  0 -1  0  0 -1  0  0  0  0  0 -1 -1  0  0 -1 -1 -1 -1  0  0
  0  1  1  1  0  1  1  1  1  0  1  1  1  1  1  1  0  0  0  0)

c - 长双精度浮点错误

2 回答 2

Related

Reference