2

我有一个 python DataFrame,其中包含一些我正在尝试为其创建一些技术指标的财务数据。我试图弄清楚如何使用移动窗口函数来加快进程,而不是逐个元素地进行。对于每个索引,我想返回过去 30 天的最大索引。我已经实现了一个元素一个元素的解决方案,但是你可以想象它非常慢。

    for s_sym in ls_symbols:
        for i in range(refresh, len(ldt_timestamps)):
            #Aroon-Up = ((period - Days Since High)/period) x 100 Aroon-Down = ((period - Days Since Low)/peiod) x 100'''
            whrmax = df_close[s_sym].ix[ldt_timestamps[i-uplen:i]].idxmax()
            maxaway = (df_close[s_sym].ix[whrmax : ldt_timestamps[i-1]]).count()
            aroonup = ((uplen - maxaway) / uplen ) * 100

            whrmin = df_close[s_sym].ix[ldt_timestamps[i-dnlen:i]].idxmin()
            minaway = df_close[s_sym].ix[whrmin : ldt_timestamps[i-1]].count()
            aroondn = ((dnlen - minaway) / dnlen ) * 100

如何创建自定义滚动窗口函数?

4

2 回答 2

5

请参阅以下位置的文档:

http://pandas.pydata.org/pandas-docs/dev/computation.html#moving-rolling-statistics-moments

还有一些很好的例子:

http://pandas.pydata.org/pandas-docs/dev/cookbook.html#grouping

In [18]: df = DataFrame(randn(1000,4),index=pd.date_range('20000101',periods=1000),
                 columns=list('ABCD'))

In [19]: pandas.stats.moments.rolling_apply(df,30,lambda x: Series(x).idxmax())
Out[19]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1000 entries, 2000-01-01 00:00:00 to 2002-09-26 00:00:00
Freq: D
Data columns (total 4 columns):
A    971  non-null values
B    971  non-null values
C    971  non-null values
D    971  non-null values
dtypes: float64(4)

In [47]: pandas.stats.moments.rolling_apply(df,30,lambda x: Series(x).idxmax()).tail(30)
Out[47]: 
             A   B   C   D
2002-08-28  24   3  26  21
2002-08-29  23   2  25  20
2002-08-30  22   1  24  19
2002-08-31  21   0  23  18
2002-09-01  20   6  29  17
2002-09-02  19   5  28  16
2002-09-03  18   4  27  15
2002-09-04  17   3  26  14
2002-09-05  16   2  25  13
2002-09-06  15   1  24  12
2002-09-07  14   0  23  11
2002-09-08  13  13  22  10
2002-09-09  12  12  21   9
2002-09-10  11  11  20   8
2002-09-11  10  10  19   7
2002-09-12   9   9  18   6
2002-09-13   8   8  17   5
2002-09-14   7   7  16   4
2002-09-15   6   6  15   3
2002-09-16   5   5  14   2
2002-09-17   4   4  13   1
2002-09-18   3   3  12   0
2002-09-19   2   2  11  11
2002-09-20   1   1  10  10
2002-09-21   0   0   9   9
2002-09-22  27  25   8   8
2002-09-23  26  24   7   7
2002-09-24  25  23   6   6
2002-09-25  24  22   5   5
2002-09-26  23  29   4   4

仅供参考,您可能几乎肯定会更好地使用rolling_max(df,30)获取特定范围内的最大值,这就是我收集您所追求的

于 2013-04-23T00:14:33.377 回答
0

单调双端队列可以在 O(N) 内解决它,速度要快得多。

def get_rolling_idxmin(l_input:[], window_size:int) -> [(int,float)]:
    res = []
    deq:[(int, float)] = []
    n = len(l_input)
    for i in range(n):
        v = l_input[i]
        if len(deq) and (i - deq[0][0]) >= window_size:
            deq.pop(0)
        while len(deq) and v <= deq[-1][1]: 
            deq.pop(-1)
        deq.append((i,v))
        res.append((deq[0][0],deq[0][1]))
    return res

l_min = get_rolling_idxmin(df.bp1[::-1].to_list(), 50)
df_min = pd.DataFrame(l_min, columns=['index_min', 'value_min'])
df_min['index_min'] = df_min.shape[0]-1-df_min.index_min
df_min = df_min[::-1]
df_min.reset_index(drop=True, inplace=True)
# print(df_min)
df = pd.concat([df,df_min], axis=1)
于 2021-11-18T03:53:09.587 回答