我正在尝试使用该系列的 ARIMA 模型进行时间序列预测:
1960-01-01 12.7
1961-01-01 12.1
1962-01-01 12.7
1963-01-01 12.8
1964-01-01 12.3
1965-01-01 13.0
1966-01-01 12.5
1967-01-01 12.9
1968-01-01 12.9
1969-01-01 13.3
1970-01-01 13.2
1971-01-01 13.0
1972-01-01 12.6
1973-01-01 12.2
1974-01-01 12.4
1975-01-01 12.7
1976-01-01 12.6
1977-01-01 12.2
1978-01-01 12.5
1979-01-01 12.2
1980-01-01 12.2
1981-01-01 12.2
1982-01-01 12.1
1983-01-01 12.3
1984-01-01 11.7
1985-01-01 11.8
1986-01-01 11.5
1987-01-01 11.2
1988-01-01 11.0
1989-01-01 10.9
1990-01-01 10.8
1991-01-01 10.8
1992-01-01 10.6
1993-01-01 10.4
1994-01-01 10.2
1995-01-01 10.2
1996-01-01 10.2
1997-01-01 10.0
1998-01-01 9.8
1999-01-01 9.8
2000-01-01 9.6
2001-01-01 9.3
2002-01-01 9.4
2003-01-01 9.5
2004-01-01 9.1
2005-01-01 9.1
2006-01-01 9.0
2007-01-01 9.0
2008-01-01 9.0
2009-01-01 9.3
2010-01-01 9.2
2011-01-01 9.1
2012-01-01 9.4
2013-01-01 9.4
2014-01-01 9.2
2015-01-01 9.6
Name: Death rate, crude (per 1,000 people), dtype: float64
我使用以下代码生成不同的 (p, d, q) 值,然后尝试每个值并获得相应的 AIC,然后选择与 AIC 最小的那个。然后在预测中使用这个 (p, d, q) 值。
import datetime
import warnings
import itertools
from sklearn.metrics import mean_squared_error as mse
def MAPE (A, F):
import numpy as np
n = len(A)
Av = np.array(A.values)
Fv = np.array(F.values)
mape = np.mean(np.abs((Av-Fv)/Av))*100
mape = np.around(mape, decimals= 2)
return mape
# Generate pdq combinations
p= d= q= range(7)
pdq = list(itertools.product(p, d, q))
# Choose min pdq corresponding to min AIC
warnings.filterwarnings('ignore')
param_aic = {}
for param in pdq:
try:
mod = sm.tsa.ARIMA(cmortS, order= param)
result = mod.fit()
param_aic[param] = result.aic
except:
continue
min_aic = min(param_aic.values())
min_param = ()
for pm, aic in param_aic.items():
if aic == min_aic:
min_param = pm
# Run the model with min pdq
model = sm.tsa.ARIMA(cmortS, order= min_param)
results = model.fit()
#Forecast validation
tp = ''
if min_param[1] > 0:
tp = 'levels'
else:
tp = 'linear'
train_sz = int(len(cmortS)*0.66)
train = cmortS[:train_sz]
tst = cmortS[train_sz:]
pred_strt = tst.index[0]
tst_pred = results.predict(start= pred_strt, typ= tp)
mserror = mse(tst, tst_pred)
mserror = np.round(mserror, decimals= 5)
mp = MAPE(tst, tst_pred)
print('Model order: {}, MAPE: {}%, mse: {}'.format(min_param, mp, mserror))
# Prediction
end_yr = '2050'
end_dt = pd.to_datetime(end_yr, format= '%Y')
strt_dt = pd.to_datetime('2014', format= '%Y')
Var_pred = results.predict(start= strt_dt, end= end_dt, typ = tp)
Var_pred
运行时出现以下错误:
ValueError: Cannot add integral value to Timestamp without freq.
尽管我使用 freq= 'AS' 的日期范围重新索引了该系列,但我仍然遇到相同的错误。
我该如何解决?