pandas - pandas.DataFrame.to_markdown 将大整数转换为浮点数

Question

pandas.DataFrame.to_markdown变int大为float. 它是错误还是功能？有什么解决办法吗？

>>> df = pd.DataFrame({"A": [123456, 123456]})
>>> print(df.to_markdown())
|    |      A |
|---:|-------:|
|  0 | 123456 |
|  1 | 123456 |

>>> df = pd.DataFrame({"A": [1234567, 1234567]})
>>> print(df.to_markdown())
|    |           A |
|---:|------------:|
|  0 | 1.23457e+06 |
|  1 | 1.23457e+06 |

>>> print(df)
         A
0  1234567
1  1234567

>>> print(df.A.dtype)
int64

score 0 · Accepted Answer

我最初只找到了一种解决方法，但没有找到解释：将列转换为字符串。

>>> df = pd.DataFrame({"A": [1234567, 1234567]})
>>> df["A"] = df.A.astype(str)
>>> print(df.to_markdown())
|    |       A |
|---:|--------:|
|  0 | 1234567 |
|  1 | 1234567 |

更新：

我认为这是由2个因素引起的：

中的_column_type函数tabulate：

def _column_type(strings, has_invisible=True, numparse=True):
    """The least generic type all column values are convertible to.

它可以通过禁用转换来解决tablefmt="pretty"：

print(df.to_markdown(tablefmt="pretty"))
+---+---------+
|   |    A    |
+---+---------+
| 0 | 1234567 |
| 1 | 1234567 |
+---+---------+

当有不止一列时，其中一列包含float数字。由于tabulate用于df.values提取数据，将转换DataFrame为numpy.array，然后所有值都转换为相同的dtype( float)。这也在本期讨论。

>>> df = pd.DataFrame({"A": [1234567, 1234567], "B": [0.1, 0.2]})
>>> print(df)
         A    B
0  1234567  0.1
1  1234567  0.2

>>> print(df.A.dtype)
int64

>>> print(df.to_markdown(tablefmt="pretty"))
+---+-----------+-----+
|   |     A     |  B  |
+---+-----------+-----+
| 0 | 1234567.0 | 0.1 |
| 1 | 1234567.0 | 0.2 |
+---+-----------+-----+

>>> df.values
array([[1.234567e+06, 1.000000e-01],
       [1.234567e+06, 2.000000e-01]])

score 0 · Accepted Answer

如果您检查 pandas 选项，则默认有效位数为 6。

import pandas as pd

pd.describe_option()

display.precision : int
    Floating point output precision (number of significant digits). This is
    only a suggestion
    [default: 6] [currently: 6]

pandas - pandas.DataFrame.to_markdown 将大整数转换为浮点数

2 回答 2

Related

Reference