1

我试图通过使用 auto_arima 并添加傅里叶项作为外生特征来预测 Python 中的时间序列。数据来自 kaggle 的商店商品需求预测挑战。它由 10 家商店和 50 个商品的长格式时间序列组成,从而导致 500 个时间序列相互堆叠。该时间序列的特殊性在于它具有具有每周和每年季节性的每日数据。

为了捕捉这两个级别的季节性,我首先使用了 Rob J Hyndman 在预测中推荐的 TBATS,其每日数据实际上效果很好。

我还关注了 TBATS python 库的创建者发布的这篇中篇文章,他将它与 SARIMAX + Fourier 项(也是 Hyndman 推荐的)进行了比较。

但是现在,当我尝试将第二种方法与 pmdarima 的 auto_arima 和 Fourier 项作为外生特征一起使用时,我得到了意想不到的结果。

在下面的代码中,我只使用了我拆分为训练数据和测试数据(去年用于预测)的 train.csv 文件,并设置了傅里叶项的最大阶数 K = 2。

我的问题是我获得了一个平滑的预测(见下图),它似乎没有捕捉到与本文末尾的结果不同的每周季节性。我的代码有问题吗?

完整代码:

# imports
import pandas as pd
from pmdarima.preprocessing import FourierFeaturizer
from pmdarima import auto_arima
import matplotlib.pyplot as plt

# Upload the data that consist in a long format time series of multiple TS stacked on top of each other
# There are 10 (stores) * 50 (items) = 500 time series
train_data = pd.read_csv('train.csv', index_col='date', parse_dates=True)

# Select only one time series for store 1 and item 1 for the purpose of the example
train_data = train_data.query('store == 1 and item == 1').sales

# Prepare the fourier terms to add as exogenous features to auto_arima
# Annual seasonality covered by fourier terms
four_terms = FourierFeaturizer(365.25, 2)
y_prime, exog = four_terms.fit_transform(train_data)
exog['date'] = y_prime.index # is exactly the same as manual calculation in the above cells
exog = exog.set_index(exog['date'])
exog.index.freq = 'D'
exog = exog.drop(columns=['date'])


# Split the time series as well as exogenous features data into train and test splits 
y_to_train = y_prime.iloc[:(len(y_prime)-365)]
y_to_test =  y_prime.iloc[(len(y_prime)-365):] # last year for testing

exog_to_train = exog.iloc[:(len(exog)-365)]
exog_to_test = exog.iloc[(len(exog)-365):]


# Fit model
# Weekly seasonality covered by SARIMAX
arima_exog_model = auto_arima(y=y_to_train, exogenous=exog_to_train, seasonal=True, m=7)

# Forecast
y_arima_exog_forecast = arima_exog_model.predict(n_periods=365, exogenous=exog_to_test)
y_arima_exog_forecast = pd.DataFrame(y_arima_exog_forecast , index = pd.date_range(start='2017-01-01', end= '2017-12-31'))


# Plots
plt.plot(y_to_test, label='Actual data')
plt.plot(y_arima_exog_forecast, label='Forecast')
plt.legend()

过去一年的实际数据和预测数据

提前感谢您的回答!

4

1 回答 1

1

如果有人感兴趣,这是答案。再次感谢 Flavia Giammarino。

# imports
import pandas as pd
from pmdarima.preprocessing import FourierFeaturizer
from pmdarima import auto_arima
import matplotlib.pyplot as plt

# Upload the data that consists long format time series of multiple TS stacked on top of each other
# There are 10 (stores) * 50 (items) time series
train_data = pd.read_csv('train.csv', index_col='date', parse_dates=True)

# Select only one time series for store 1 and item 1 for the purpose of the example
train_data = train_data.query('store == 1 and item == 1').sales

# Prepare the fourier terms to add as exogenous features to auto_arima
# Annual seasonality covered by fourier terms
four_terms = FourierFeaturizer(365.25, 1)
y_prime, exog = four_terms.fit_transform(train_data)
exog['date'] = y_prime.index # is exactly the same as manual calculation in the above cells
exog = exog.set_index(exog['date'])
exog.index.freq = 'D'
exog = exog.drop(columns=['date'])


# Split the time series as well as exogenous features data into train and test splits 
y_to_train = y_prime.iloc[:(len(y_prime)-365)]
y_to_test =  y_prime.iloc[(len(y_prime)-365):] # last year for testing

exog_to_train = exog.iloc[:(len(exog)-365)]
exog_to_test = exog.iloc[(len(exog)-365):]


# Fit model
# Weekly seasonality covered by SARIMAX
arima_exog_model = auto_arima(y=y_to_train, D=1, exogenous=exog_to_train, seasonal=True, m=7)

# Forecast
y_arima_exog_forecast = arima_exog_model.predict(n_periods=365, exogenous=exog_to_test)
y_arima_exog_forecast = pd.DataFrame(y_arima_exog_forecast , index = pd.date_range(start='2017-01-01', end= '2017-12-31'))


# Plots
plt.plot(y_to_test, label='Actual data')
plt.plot(y_arima_exog_forecast, label='Forecast')
plt.legend()

在此处输入图像描述

于 2021-08-27T16:02:48.783 回答