python - 浮点数和字符串转换的奇怪行为

Question

我已经在 python shell 中输入了这个：

>>> 0.1*0.1
0.010000000000000002

我预计 0.1*0.1 不是 0.01，因为我知道以 10 为底的 0.1 在以 2 为底的周期性。

>>> len(str(0.1*0.1))
4

我预计会得到 20 个，因为我在上面看到了 20 个字符。为什么我得到4？

>>> str(0.1*0.1)
'0.01'

好的，这解释了为什么我len给我 4，但为什么str返回'0.01'？

>>> repr(0.1*0.1)
'0.010000000000000002'

为什么str圆repr而不圆？（我已经阅读了这个答案，但我想知道他们是如何决定何时str进行浮点数的，何时不进行）

>>> str(0.01) == str(0.0100000000001)
False
>>> str(0.01) == str(0.01000000000001)
True

所以这似乎是浮点数准确性的问题。我认为 Python 会使用 IEEE 754 单精度浮点数。所以我已经这样检查了：

#include <stdint.h>
#include <stdio.h> // printf

union myUnion {
    uint32_t i; // unsigned integer 32-bit type (on every machine)
    float f;    // a type you want to play with
};

int main() {
    union myUnion testVar;
    testVar.f = 0.01000000000001f;
    printf("%f\n", testVar.f);

    testVar.f = 0.01000000000000002f;
    printf("%f\n", testVar.f);

    testVar.f = 0.01f*0.01f;
    printf("%f\n", testVar.f);
}

我有：

0.010000
0.010000
0.000100

Python给了我：

>>> 0.01000000000001
0.010000000000009999
>>> 0.01000000000000002
0.010000000000000019
>>> 0.01*0.01
0.0001

为什么 Python 会给我这些结果？

（我使用 Python 2.6.5。如果您知道 Python 版本的差异，我也会对它们感兴趣。）

score 16 · Accepted Answer

The crucial requirement on repr is that it should round-trip; that is, eval(repr(f)) == f should give True in all cases.

In Python 2.x (before 2.7) repr works by doing a printf with format %.17g and discarding trailing zeroes. This is guaranteed correct (for 64-bit floats) by IEEE-754. Since 2.7 and 3.1, Python uses a more intelligent algorithm that can find shorter representations in some cases where %.17g gives unnecessary non-zero terminal digits or terminal nines. See What's new in 3.1? and issue 1580.

Even under Python 2.7, repr(0.1 * 0.1) gives "0.010000000000000002". This is because 0.1 * 0.1 == 0.01 is False under IEEE-754 parsing and arithmetic; that is, the nearest 64-bit floating-point value to 0.1, when multiplied by itself, yields a 64-bit floating-point value that is not the nearest 64-bit floating-point value to 0.01:

>>> 0.1.hex()
'0x1.999999999999ap-4'
>>> (0.1 * 0.1).hex()
'0x1.47ae147ae147cp-7'
>>> 0.01.hex()
'0x1.47ae147ae147bp-7'
                 ^ 1 ulp difference

The difference between repr and str (pre-2.7/3.1) is that str formats with 12 decimal places as opposed to 17, which is non-round-trippable but produces more readable results in many cases.

score 5 · Accepted Answer

I can confirm your behaviour

ActivePython 2.6.4.10 (ActiveState Software Inc.) based on
Python 2.6.4 (r264:75706, Jan 22 2010, 17:24:21) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> repr(0.1)
'0.10000000000000001'
>>> repr(0.01)
'0.01'

Now, the docs claim that in Python <2.7

the value of repr(1.1) was computed as format(1.1, '.17g')

This is a slight simplification.

Note that this is all to do with the string formatting code -- in memory, all Python floats are just stored as C++ doubles, so there is never going to be a difference between them.

Also, it's kind of unpleasant to work with the full-length string for a float even if you know that there's a better one. Indeed, in modern Pythons a new algorithm is used for float formatting, that picks the shortest representation in a smart way.

I spent a while looking this up in the source code, so I'll include the details here in case you're interested. You can skip this section.

In floatobject.c, we see

static PyObject *
float_repr(PyFloatObject *v)
{
    char buf[100];
    format_float(buf, sizeof(buf), v, PREC_REPR);

    return PyString_FromString(buf);
}

which leads us to look at format_float. Omitting the NaN/inf special cases, it is:

format_float(char *buf, size_t buflen, PyFloatObject *v, int precision)
{
    register char *cp;
    char format[32];
    int i;

    /* Subroutine for float_repr and float_print.
       We want float numbers to be recognizable as such,
       i.e., they should contain a decimal point or an exponent.
       However, %g may print the number as an integer;
       in such cases, we append ".0" to the string. */

    assert(PyFloat_Check(v));
    PyOS_snprintf(format, 32, "%%.%ig", precision);
    PyOS_ascii_formatd(buf, buflen, format, v->ob_fval);
    cp = buf;
    if (*cp == '-')
        cp++;
    for (; *cp != '\0'; cp++) {
        /* Any non-digit means it's not an integer;
           this takes care of NAN and INF as well. */
        if (!isdigit(Py_CHARMASK(*cp)))
            break;
    }
    if (*cp == '\0') {
        *cp++ = '.';
        *cp++ = '0';
        *cp++ = '\0';
        return;
    }

    <some NaN/inf stuff>
}

We can see that

So this first initialises some variables and checks that v is a well-formed float. It then prepares a format string:

PyOS_snprintf(format, 32, "%%.%ig", precision);

Now PREC_REPR is defined elsewhere in floatobject.c as 17, so this computes to "%.17g". Now we call

PyOS_ascii_formatd(buf, buflen, format, v->ob_fval);

With the end of the tunnel in sight, we look up PyOS_ascii_formatd and discover that it uses snprintf internally.

score 1 · Accepted Answer

1

于 2012-11-12T14:29:24.403 回答

python - 浮点数和字符串转换的奇怪行为

3 回答 3

Related

Reference