python - 在多个 DataFrame 上执行操作的最佳方法是什么？

Question

假设我有三个 DataFrame：

import pandas as pd
import numpy as np

cols = ['A','B','C']
index = [1,2,3,4,5]
np.random.seed(42)

apple = pd.DataFrame(np.random.randn(5,3), index=index, columns=cols)
orange = pd.DataFrame(np.random.randn(5,3), index=index, columns=cols)
banana = pd.DataFrame(np.random.randn(5,3), index=index, columns=cols)

In [50]: apple
Out[50]:
          A         B         C
1  0.496714 -0.138264  0.647689
2  1.523030 -0.234153 -0.234137
3  1.579213  0.767435 -0.469474
4  0.542560 -0.463418 -0.465730
5  0.241962 -1.913280 -1.724918

In [51]: orange
Out[51]:
          A         B         C
1 -0.562288 -1.012831  0.314247
2 -0.908024 -1.412304  1.465649
3 -0.225776  0.067528 -1.424748
4 -0.544383  0.110923 -1.150994
5  0.375698 -0.600639 -0.291694

In [52]: banana
Out[52]:
          A         B         C
1 -0.601707  1.852278 -0.013497
2 -1.057711  0.822545 -1.220844
3  0.208864 -1.959670 -1.328186
4  0.196861  0.738467  0.171368
5 -0.115648 -0.301104 -1.478522

创建具有相同列和索引的新数据框的最佳/最快/最简单方法是什么，但苹果、橙子、香蕉的每列和索引中的最大值是什么？即，对于 [1,A]，新数据帧的值为 0.496714，对于 [1,B]，新数据帧的值为 1.852278，等等。谢谢！

score 3 · Accepted Answer

我认为这样的事情应该很快：

np.maximum(np.maximum(orange, apple), banana)

使用numpy.maximum()：

数组元素的元素最大值。

正如@Jeff 在评论中建议的那样，一般来说是：

reduce(np.maximum, [orange,apple,banana])

score 0 · Accepted Answer

为什么不连接DataFrames成 aPanel然后使用Panel.max()？

IE：pd.Panel({'a':apple ,'b':banana,'o';orange}).max(axis=0)

诚然不是最快的，但这保证了正确的索引对齐，您可能希望Panel稍后将其用于其他内容。您的数据看起来是 3D 的，具有 3 个索引元素（cols/index/fruit），因此请使用 3D 数据结构。

python - 在多个 DataFrame 上执行操作的最佳方法是什么？

2 回答 2

Related

Reference