我试图通过使用 auto_arima 并添加傅里叶项作为外生特征来预测 Python 中的时间序列。数据来自 kaggle 的商店商品需求预测挑战。它由 10 家商店和 50 个商品的长格式时间序列组成,从而导致 500 个时间序列相互堆叠。该时间序列的特殊性在于它具有具有每周和每年季节性的每日数据。
为了捕捉这两个级别的季节性,我首先使用了 Rob J Hyndman 在预测中推荐的 TBATS,其每日数据实际上效果很好。
我还关注了 TBATS python 库的创建者发布的这篇中篇文章,他将它与 SARIMAX + Fourier 项(也是 Hyndman 推荐的)进行了比较。
但是现在,当我尝试将第二种方法与 pmdarima 的 auto_arima 和 Fourier 项作为外生特征一起使用时,我得到了意想不到的结果。
在下面的代码中,我只使用了我拆分为训练数据和测试数据(去年用于预测)的 train.csv 文件,并设置了傅里叶项的最大阶数 K = 2。
我的问题是我获得了一个平滑的预测(见下图),它似乎没有捕捉到与本文末尾的结果不同的每周季节性。我的代码有问题吗?
完整代码:
# imports
import pandas as pd
from pmdarima.preprocessing import FourierFeaturizer
from pmdarima import auto_arima
import matplotlib.pyplot as plt
# Upload the data that consist in a long format time series of multiple TS stacked on top of each other
# There are 10 (stores) * 50 (items) = 500 time series
train_data = pd.read_csv('train.csv', index_col='date', parse_dates=True)
# Select only one time series for store 1 and item 1 for the purpose of the example
train_data = train_data.query('store == 1 and item == 1').sales
# Prepare the fourier terms to add as exogenous features to auto_arima
# Annual seasonality covered by fourier terms
four_terms = FourierFeaturizer(365.25, 2)
y_prime, exog = four_terms.fit_transform(train_data)
exog['date'] = y_prime.index # is exactly the same as manual calculation in the above cells
exog = exog.set_index(exog['date'])
exog.index.freq = 'D'
exog = exog.drop(columns=['date'])
# Split the time series as well as exogenous features data into train and test splits
y_to_train = y_prime.iloc[:(len(y_prime)-365)]
y_to_test = y_prime.iloc[(len(y_prime)-365):] # last year for testing
exog_to_train = exog.iloc[:(len(exog)-365)]
exog_to_test = exog.iloc[(len(exog)-365):]
# Fit model
# Weekly seasonality covered by SARIMAX
arima_exog_model = auto_arima(y=y_to_train, exogenous=exog_to_train, seasonal=True, m=7)
# Forecast
y_arima_exog_forecast = arima_exog_model.predict(n_periods=365, exogenous=exog_to_test)
y_arima_exog_forecast = pd.DataFrame(y_arima_exog_forecast , index = pd.date_range(start='2017-01-01', end= '2017-12-31'))
# Plots
plt.plot(y_to_test, label='Actual data')
plt.plot(y_arima_exog_forecast, label='Forecast')
plt.legend()
提前感谢您的回答!