7

假设我在熊猫中有 2 个系列:

from datetime import datetime, timedelta
import pandas as pd
d = datetime.now()
index = [d + timedelta(seconds = i) for i in range(5)]
a = pd.Series([1,4,5,7,8], index = index)
b = pd.Series([2,3,6,7,8], index = index)

获取相应索引元素的最小值/最大值的最佳方法是什么。喜欢:

min_func(a, b): [1,3,5,7,8] (for given index)
max_func(a, b): [2,4,6,7,8]

我可以在文档中找到的唯一函数是在系列中返回 min/max 的 min/max 函数,而 .apply 函数不采用 index 参数。有没有更好的方法来实现它而无需手动系列迭代或一些算术魔法(如 min_func: a * (a < b) + b * (b <= a), max_func: a * (a > b) + b * ( b >= a) )

谢谢

4

1 回答 1

8

将系列组合成一个按索引自动对齐的框架

In [51]: index
Out[51]: 
[datetime.datetime(2013, 8, 26, 18, 33, 48, 990974),
 datetime.datetime(2013, 8, 26, 18, 33, 49, 990974),
 datetime.datetime(2013, 8, 26, 18, 33, 50, 990974),
 datetime.datetime(2013, 8, 26, 18, 33, 51, 990974),
 datetime.datetime(2013, 8, 26, 18, 33, 52, 990974)]

In [52]: a = pd.Series([1,4,5,7,8], index = index)

In [53]: b = pd.Series([2,3,6,7,8], index = index)

In [54]: a
Out[54]: 
2013-08-26 18:33:48.990974    1
2013-08-26 18:33:49.990974    4
2013-08-26 18:33:50.990974    5
2013-08-26 18:33:51.990974    7
2013-08-26 18:33:52.990974    8
dtype: int64

In [55]: b
Out[55]: 
2013-08-26 18:33:48.990974    2
2013-08-26 18:33:49.990974    3
2013-08-26 18:33:50.990974    6
2013-08-26 18:33:51.990974    7
2013-08-26 18:33:52.990974    8
dtype: int64

In [56]: df = DataFrame({ 'a' : a, 'b' : b })

In [57]: df
Out[57]: 
                            a  b
2013-08-26 18:33:48.990974  1  2
2013-08-26 18:33:49.990974  4  3
2013-08-26 18:33:50.990974  5  6
2013-08-26 18:33:51.990974  7  7
2013-08-26 18:33:52.990974  8  8

最小/最大

In [9]: df.max(1)
Out[9]: 
2013-08-26 18:33:48.990974    2
2013-08-26 18:33:49.990974    4
2013-08-26 18:33:50.990974    6
2013-08-26 18:33:51.990974    7
2013-08-26 18:33:52.990974    8
Freq: S, dtype: int64

In [10]: df.min(1)
Out[10]: 
2013-08-26 18:33:48.990974    1
2013-08-26 18:33:49.990974    3
2013-08-26 18:33:50.990974    5
2013-08-26 18:33:51.990974    7
2013-08-26 18:33:52.990974    8
Freq: S, dtype: int64

最小/最大指数

In [11]: df.idxmax(1)
Out[11]: 
2013-08-26 18:33:48.990974    b
2013-08-26 18:33:49.990974    a
2013-08-26 18:33:50.990974    b
2013-08-26 18:33:51.990974    a
2013-08-26 18:33:52.990974    a
Freq: S, dtype: object

In [12]: df.idxmin(1)
Out[12]: 
2013-08-26 18:33:48.990974    a
2013-08-26 18:33:49.990974    b
2013-08-26 18:33:50.990974    a
2013-08-26 18:33:51.990974    a
2013-08-26 18:33:52.990974    a
Freq: S, dtype: object
于 2013-08-26T22:36:40.037 回答