6

这里有点新,但试图让一个 statsmodel ARMA 预测工具工作。我从雅虎导入了一些股票数据,并得到了 ARMA 来给我合适的参数。但是,当我使用预测代码时,我收到的只是一个我似乎无法弄清楚的错误列表。不太确定我在这里做错了什么:

import pandas
import statsmodels.tsa.api as tsa
from pandas.io.data import DataReader

start = pandas.datetime(2013,1,1)
end = pandas.datetime.today()

data = DataReader('GOOG','yahoo')
arma =tsa.ARMA(data['Close'], order =(2,2))
results= arma.fit()
results.predict(start=start,end=end)

错误是:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Windows\system32\<ipython-input-84-25a9b6bc631d> in <module>()
     13 results= arma.fit()
     14 results.summary()
---> 15 results.predict(start=start,end=end)

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\base\wrapp
er.pyc in wrapper(self, *args, **kwargs)
     88         results = object.__getattribute__(self, '_results')
     89         data = results.model.data
---> 90         return data.wrap_output(func(results, *args, **kwargs), how)
     91
     92     argspec = inspect.getargspec(func)

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in predict(self, start, end, exog, dynamic)
   1265
   1266         """
-> 1267         return self.model.predict(self.params, start, end, exog, dynamic
)
   1268
   1269     def forecast(self, steps=1, exog=None, alpha=.05):

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in predict(self, params, start, end, exog, dynamic)
    497
    498         # will return an index of a date

--> 499         start = self._get_predict_start(start, dynamic)
    500         end, out_of_sample = self._get_predict_end(end, dynamic)
    501         if out_of_sample and (exog is None and self.k_exog > 0):

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in _get_predict_start(self, start, dynamic)
    404             #elif 'mle' not in method or dynamic: # should be on a date

    405             start = _validate(start, k_ar, k_diff, self.data.dates,
--> 406                               method)
    407             start = super(ARMA, self)._get_predict_start(start)
    408         _check_arima_start(start, k_ar, k_diff, method, dynamic)

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in _validate(start, k_ar, k_diff, dates, method)
    160     if isinstance(start, (basestring, datetime)):
    161         start_date = start
--> 162         start = _index_date(start, dates)
    163         start -= k_diff
    164     if 'mle' not in method and start < k_ar - k_diff:

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\base\d
atetools.pyc in _index_date(date, dates)
     37         freq = _infer_freq(dates)
     38         # we can start prediction at the end of endog

---> 39         if _idx_from_dates(dates[-1], date, freq) == 1:
     40             return len(dates)
     41

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\base\d
atetools.pyc in _idx_from_dates(d1, d2, freq)
     70         from pandas import DatetimeIndex
     71         return len(DatetimeIndex(start=d1, end=d2,
---> 72                                  freq = _freq_to_pandas[freq])) - 1
     73     except ImportError, err:
     74         from pandas import DateRange

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\base\d
atetools.pyc in __getitem__(self, key)
     11         # being lazy, don't want to replace dictionary below

     12         def __getitem__(self, key):
---> 13             return get_offset(key)
     14     _freq_to_pandas = _freq_to_pandas_class()
     15 except ImportError, err:

D:\Python27\lib\site-packages\pandas\tseries\frequencies.pyc in get_offset(name)

    484     """
    485     if name not in _dont_uppercase:
--> 486         name = name.upper()
    487
    488         if name in _rule_aliases:

AttributeError: 'NoneType' object has no attribute 'upper'
4

2 回答 2

5

对我来说似乎是一个错误。我会调查的。

https://github.com/statsmodels/statsmodels/issues/712

编辑:作为一种解决方法,您可以从 DataFrame 中删除 DatetimeIndex 并将其传递给 numpy 数组。它使预测在日期方面有点棘手,但是在没有频率的情况下使用日期进行预测已经非常棘手,因此仅具有开始日期和结束日期本质上是没有意义的。

import pandas
import statsmodels.tsa.api as tsa
from pandas.io.data import DataReader
import pandas

data = DataReader('GOOG','yahoo')
dates = data.index

# start at a date on the index
start = dates.get_loc(pandas.datetools.parse("1-2-2013"))
end = start + 30 # "steps"

# NOTE THE .values
arma =tsa.ARMA(data['Close'].values, order =(2,2))
results= arma.fit()
results.predict(start, end)
于 2013-03-20T05:04:43.773 回答
0

当我运行您的代码时,我得到:

“ValueError:这些日期没有频率,并且日期 2013-01-01 00:00:00 不在日期索引中。尝试给出日期索引中的日期或使用整数”

由于交易日期的发生频率不均(节假日和周末),因此模型不够聪明,无法知道正确的计算频率。

如果你用它们在索引中的整数位置替换日期,那么你就会得到你的预测。然后,您可以简单地将原始索引放回结果中。

prediction = results.predict(start=0, end=len(data) - 1)
prediction.index = data.index
print(prediction)

2010-01-04    689.507451
2010-01-05    627.085986
2010-01-06    624.256331
2010-01-07    608.133481
...
2017-05-09    933.700555
2017-05-10    931.290023
2017-05-11    927.781427
2017-05-12    929.661014

顺便说一句,您可能希望在每日收益而不是原始价格上运行这样的模型。以原始价格运行它不会像你想象的那样捕捉动力和均值回归。您的模型是根据价格的绝对值构建的,而不是基于价格、动量、移动平均线等您可能想要使用的其他因素的变化。您创建的预测看起来会非常好,因为它们只是提前一步预测,所以它没有捕捉到复合误差。这让很多人感到困惑。相对于股票价格的绝对值,误差看起来很小,但该模型的预测性并不强。

我建议您先阅读此演练:

http://www.johnwittenauer.net/a-simple-time-series-analysis-of-the-sp-500-index/

于 2017-05-16T18:19:20.640 回答