3

我想计算每周回报,但从结束日期开始计算。这是我最初尝试使用 pandas 实现它:

import pandas as pd
import numpy as np
from pandas.tseries.offsets import BDay

index = pd.date_range(start='2020-09-13', end='2020-10-13', freq=BDay())
index_len = len(index)
dfw = pd.DataFrame(data=np.arange(start=1, stop=1+(index_len-1)*0.002, step=0.002),
                   index=index,
                   columns=['col1'])


def weekly_ret(x):
    if x.size > 0:
        print(f"range is {x.index[0]} - {x.index[-1]}")
        return (x.iloc[-1] - x.iloc[0]) / x.iloc[0]
    else:
        return np.nan


dfw = dfw.resample(rule='5B').apply(weekly_ret)
print(dfw)

然后我得到以下输出,但这不是我想要的:

range is 2020-09-14 00:00:00 - 2020-09-18 00:00:00
range is 2020-09-21 00:00:00 - 2020-09-25 00:00:00
range is 2020-09-28 00:00:00 - 2020-10-02 00:00:00
range is 2020-10-05 00:00:00 - 2020-10-09 00:00:00
range is 2020-10-12 00:00:00 - 2020-10-13 00:00:00
                col1
2020-09-14  0.008000
2020-09-21  0.007921
2020-09-28  0.007843
2020-10-05  0.007767
2020-10-12  0.001923

我希望它从2020-10-13向后开始,以便最后一个范围是:

range is 2020-10-07 00:00:00 - 2020-10-13 00:00:00 

代替:

range is 2020-10-12 00:00:00 - 2020-10-13 00:00:00

到目前为止我已经尝试过:

  1. 反转数据框dfw = dfw.reindex(index=dfw.index[::-1])
  2. 上面的步骤 #1 加上规则是-5B,这会导致错误。
  3. 使用 resample 函数的 origin 参数,但这对计算的顺序没有影响,即origin=dfw.index[-1]
  4. 上面的步骤 #1 加上计算倒置数据帧上的每行数,dfw = dfw.rolling(5).apply(weekly_ret)[::5]但在这里我得到了第一个(最后一个)间隔的 NaN,这个解决方案也有点浪费。

更新:这将是想要的输出;请注意,最后一次返回考虑的是从索引中最后一天开始的那一周:

range is 2020-09-16 00:00:00 - 2020-09-22 00:00:00 = 0.007968127490039847
range is 2020-09-23 00:00:00 - 2020-09-29 00:00:00 = 0.00788954635108482
range is 2020-09-30 00:00:00 - 2020-10-06 00:00:00 = 0.007812500000000007
range is 2020-10-07 00:00:00 - 2020-10-13 00:00:00 = 0.00773694390715668
                col1
2020-09-22  0.007968
2020-09-29  0.007890
2020-10-06  0.007813
2020-10-13  0.007737 i.e. (1.042 - 1.034)/1.034
4

1 回答 1

1

因此,您正在寻找的是锚定偏移量,即每周重新采样 DataFrame,从您最后一个索引所在的同一工作日开始。在您的情况下,2020-10-13是星期二,即您要使用规则W-TUE。我建议使用查找字典将.weekday()数字(例如Tuesday == 1)转换为相应的规则。然后,您只需将您的功能应用于.resample()

rule_lookup={
    0:'W-MON',
    1:'W-TUE',
    2:'W-WED',
    3:'W-THU',
    4:'W-FRI',
    5:'W-SAT',
    6:'W-SUN'
}

# get the proper rule which ends on the last date in the index
rule = rule_lookup[dfw.index[-1].weekday()] 
print(f"=> resampling using rule: {rule}")
dfw = dfw.resample(rule=rule).apply(weekly_ret)
print(dfw)

产量:

=> resampling using rule: W-TUE
range is 2020-09-14 00:00:00 - 2020-09-15 00:00:00
range is 2020-09-16 00:00:00 - 2020-09-22 00:00:00
range is 2020-09-23 00:00:00 - 2020-09-29 00:00:00
range is 2020-09-30 00:00:00 - 2020-10-06 00:00:00
range is 2020-10-07 00:00:00 - 2020-10-13 00:00:00
                col1
2020-09-15  0.002000
2020-09-22  0.007968
2020-09-29  0.007890
2020-10-06  0.007813
2020-10-13  0.007737
于 2020-11-26T07:37:03.683 回答