python - 如何优雅地计算多日期数据框的百分比变化率？

Question

我有一个数据框，索引是日期时间。它包含一个列 - 价格

In [9]: df = pd.DataFrame({'price':[3,5,6,10,11]}, index=pd.to_datetime(['2016-01-01 14:58:00', 
'2016-01-01 14:58:00', '2016-01-01 14:58:00', '2016-01-02 09:30:00', '2016-01-02 09:31:00']))
   ...: 

In [10]: df
Out[10]: 
                     price
2016-01-01 14:58:00      3
2016-01-01 14:58:00      5
2016-01-01 14:58:00      6
2016-01-02 09:30:00     10
2016-01-02 09:31:00     11

我想计算下一个回报（某些时间间隔的价格百分比变化率）。

数据帧有一个 pct_change() 函数可以计算变化率。

In [12]: df['price'].pct_change().shift(-1)
Out[12]: 
2016-01-01 14:58:00    0.666667
2016-01-01 14:58:00    0.200000
2016-01-01 14:58:00    0.666667
2016-01-02 09:30:00    0.100000
2016-01-02 09:31:00         NaN
Name: price, dtype: float64

但是，我希望交叉日期元素为 nan

这意味着，我想df['pct_change'].loc['2016-01-01 14:58:00']成为nan，因为它使用 tomw 的数据计算 pct_change (2016-01-02 09:30:00)

预期输出：

2016-01-01 14:58:00    0.666667
2016-01-01 14:58:00    0.200000
2016-01-01 14:58:00         NaN
2016-01-02 09:30:00    0.100000
2016-01-02 09:31:00         NaN
Name: price, dtype: float64

我可以做一个面具来过滤掉那些。但我认为这个解决方案不够优雅，有什么建议吗？

score 3 · Accepted Answer

您可以GroupBy.apply通过以下方式使用DatetimeIndex.date：

s1 = df.groupby(df.index.date)['price'].apply(lambda x: x.pct_change().shift(-1))
print (s1)
2016-01-01 14:58:00    0.666667
2016-01-01 14:58:00    0.200000
2016-01-01 14:58:00         NaN
2016-01-02 09:30:00    0.100000
2016-01-02 09:31:00         NaN
Name: price, dtype: float64

python - 如何优雅地计算多日期数据框的百分比变化率？

1 回答 1

Related

Reference