1

我有一个记录间隔为 30 秒的数据集,我正在尝试使用时间序列模块中的 ARMA 函数进行预测预测。由于数据隐私,我使用随机数据来重现错误

import numpy as np
from pandas import *
import statsmodels.api as sm
data = np.random.rand(100000)
data_index = date_range('2013-5-26', periods = len(data), freq='30s')
data = np.array(data)
data_series = Series(data, index = data_index)
model = sm.tsa.ARMA(data_series,(1,0)).fit()

我的包版本:
Python 版本 2.7.3
pandas 版本 0.11.0
statsmodels 版本 0.5.0

主要错误信息如下(我省略了一些):

ValueError        Traceback (most recent call last)
<ipython-input-24-0f57c74f0fc9> in <module>()

6 data = np.array(data)
7 data_series = Series(data, index = data_index)
----> 8 model = sm.tsa.ARMA(data_series,(1,0)).fit()

...........

...........

ValueError: freq 30S not understood

在我看来,ARMA 不支持熊猫生成的日期格式?如果我删除 date_range 中的 freq 选项,则此命令将再次不适用于大型系列,因为年份将远远超出 pandas 限制。

无论如何要绕过?谢谢

更新:好的,使用 data_series.values 会起作用,但是接下来,我该如何进行预测?我的 data_index 来自 [2013-05-26 00:00:00, ..., 2013-06-29 17:19:30]

prediction = model.predict('2013-05-26 00:00:00', '2013-06-29 17:19:30', dynamic=False)

仍然给我一个错误

我知道 prediction = model.predict() 可以通过并生成整个序列预测然后我可以匹配,但总的来说它不是那么方便。

4

1 回答 1

1

The problem is that this freq doesn't give back an offset from pandas for some reason, and we need an offset to be able to use the dates for anything. It looks like a pandas bug/not implemented to me.

from pandas.tseries.frequencies import get_offset
get_offset('30s')

Perhaps we could improve the error message though.

[Edit We don't really need the dates except for adding them back in for convenience in prediction, so you can still estimate the model by using data_series.values.]

于 2013-07-10T21:24:46.087 回答