python - 加快从 python 的滚动窗口中查找前 5 个数字的平均值

Question

我想从 30 的滚动窗口中创建一个包含 5 个最高值的平均值的列。对于大型 DataFrame，使用 for 循环非常慢。我尝试将 rolling() 与 nlargest() 一起使用，但它不起作用。有什么建议可以加快速度吗？

def top_values(df, column, days):
    top5 = df.nlargest(days, column)
    top = top5[column].sum() / days

x = 0
w = 0
for i in df.index:
    if x > 30:
        df['tops'][x] = top_values(df[w:x], 'column', 5)
        w += 1
        x += 1

score 4 · Accepted Answer

一种方法是在您的中使用 lambda 函数，rolling例如获取排序列表的前 5 个元素的平均值：

df['column'].rolling(30).apply(lambda x: np.mean(sorted(x,reverse=True)[:5]))

最小的例子：

在 15 个元素的数据帧上，我们可以在 5 个窗口中获取前 3 个值的平均值来演示：

>>> df
    column
0       48
1        9
2       36
3       71
4       59
5       16
6        9
7       18
8       43
9        3
10      54
11      23
12      12
13      38
14      54

>>> df['column'].rolling(5).apply(lambda x: np.mean(sorted(x,reverse=True)[:3]))
0           NaN
1           NaN
2           NaN
3           NaN
4     59.333333
5     55.333333
6     55.333333
7     49.333333
8     40.000000
9     25.666667
10    38.333333
11    40.000000
12    40.000000
13    38.333333
14    48.666667
Name: column, dtype: float64

python - 加快从 python 的滚动窗口中查找前 5 个数字的平均值

1 回答 1

Related

Reference