python - Pandas DafaFrame 中的四舍五入条目

Question

使用：

newdf3.pivot_table(rows=['Quradate'],aggfunc=np.mean)

产生：

           Alabama_exp  Credit_exp  Inventory_exp   National_exp    Price_exp   Sales_exp
Quradate                        
2010-01-15   0.568003    0.404481    0.488601    0.483097    0.431211    0.570755
2010-04-15   0.543620    0.385417    0.455078    0.468750    0.408203    0.564453

我想将十进制数字四舍五入为两位数并乘以 100，例如 .568003 应该是 57 摆弄了一段时间无济于事；试过这个

newdf3.pivot_table(rows=['Quradate'],aggfunc=np.mean).apply(round(2)) #and got:
TypeError: ("'float' object is not callable", u'occurred at index Alabama_exp')

尝试了许多其他方法无济于事大多数抱怨该项目不是浮动...我看到 Pandas 系列对象有一个圆形方法但 DF 没有我尝试使用 df.apply 但它抱怨浮动问题。

score 20 · Accepted Answer

只需使用numpy.round，例如：

100 * np.round(newdf3.pivot_table(rows=['Quradate'], aggfunc=np.mean), 2)

只要 round 适用于所有列类型，这适用于DataFrame.

有一些数据：

In [9]: dfrm
Out[9]:
          A         B         C
0 -1.312700  0.760710  1.044006
1 -0.792521 -0.076913  0.087334
2 -0.557738  0.982031  1.365357
3  1.013947  0.345896 -0.356652
4  1.278278 -0.195477  0.550492
5  0.116599 -0.670163 -1.290245
6 -1.808143 -0.818014  0.713614
7  0.233726  0.634349  0.561103
8  2.344671 -2.331232 -0.759296
9 -1.658047  1.756503 -0.996620

In [10]: 100*np.round(dfrm, 2)
Out[10]:
     A    B    C
0 -131   76  104
1  -79   -8    9
2  -56   98  137
3  101   35  -36
4  128  -20   55
5   12  -67 -129
6 -181  -82   71
7   23   63   56
8  234 -233  -76
9 -166  176 -100

score 8 · Accepted Answer

从 Pandas 0.17 开始，DataFrames 有一个“round”方法：

df =newdf3.pivot_table(rows=['Quradate'],aggfunc=np.mean)
df.round()

这甚至允许您对每列具有不同的精度

df.round({'Alabama_exp':2, 'Credit_exp':3})

score 5 · Accepted Answer

对于中等大小的DataFrame,applymap将会非常缓慢，因为它在 Python 中逐个元素地应用 Python 函数（即，没有 Cython 加速它）。apply使用起来更快functools.partial：

In [22]: from functools import partial

In [23]: df = DataFrame(randn(100000, 20))

In [24]: f = partial(Series.round, decimals=2)

In [25]: timeit df.applymap(lambda x: round(x, 2))
1 loops, best of 3: 2.52 s per loop

In [26]: timeit df.apply(f)
10 loops, best of 3: 33.4 ms per loop

您甚至可以创建一个返回可以应用的部分函数的函数：

In [27]: def column_round(decimals):
   ....:     return partial(Series.round, decimals=decimals)
   ....:

In [28]: df.apply(column_round(2))

正如@EMS 建议的那样，您也可以使用np.round，因为DataFrame实现了__array__属性并自动包装了许多numpyufunc。上面显示的框架也快两倍：

In [47]: timeit np.round(df, 2)
100 loops, best of 3: 17.4 ms per loop

如果您有非数字列，您可以这样做：

In [12]: df = DataFrame(randn(100000, 20))

In [13]: df['a'] = tm.choice(['a', 'b'], size=len(df))

In [14]: dfnum = df._get_numeric_data()

In [15]: np.round(dfnum)

避免在numpy尝试对一列字符串进行舍入时引发的神秘错误。

score 3 · Accepted Answer

我将这里留下来解释为什么 OP 的方法会引发错误，但后续解决方案会更好。

最好的解决方案是简单地使用 Series 的round方法：

In [11]: s
Out[11]: 
0    0.026574
1    0.304801
2    0.057819
dtype: float64

In [12]: 100*s.round(2)
Out[12]:  
0     3
1    30
2     6
dtype: float64

你也可以.astype('int')在那里继续，这取决于你接下来想做什么。

要了解您的方法为何不起作用，请记住该函数round需要两个参数，即小数位数和要舍入的数据。通常，要应用带有两个参数的函数，您可以像这样“curry”该函数：

In [13]: s.apply(lambda x: round(x, 2))
Out[13]: 
0    1.03
1    1.30
2   -1.06
dtype: float64

正如 DSM 在评论中指出的那样，对于这种情况，实际上需要使用柯里化方法 - 因为 DataFrames 没有round方法。df.applymap(...)是要走的路。

python - Pandas DafaFrame 中的四舍五入条目

4 回答 4

Related

Reference