0

我有一个通用数据集用于我的预测,其中包括全球数据。

    ds                 y     country_id
01/01/2021 09:00:00   5.0       1
01/01/2021 09:10:00   5.2       1
01/01/2021 09:20:00   5.4       1
01/01/2021 09:30:00   6.1       1
01/01/2021 09:00:00   2.0       2
01/01/2021 09:10:00   2.2       2
01/01/2021 09:20:00   2.4       2
01/01/2021 09:30:00   3.1       2



    playoffs = pd.DataFrame({
      'holiday': 'playoff',
      'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16',
                            '2010-01-24', '2010-02-07', '2011-01-08',
                            '2013-01-12', '2014-01-12', '2014-01-19',
                            '2014-02-02', '2015-01-11', '2016-01-17',
                            '2016-01-24', '2016-02-07']),
      'lower_window': 0,
      'upper_window': 1,
    })
    superbowls = pd.DataFrame({
      'holiday': 'superbowl',
      'ds': pd.to_datetime(['2010-02-07', '2014-02-02', '2016-02-07']),
      'lower_window': 0,
      'upper_window': 1,

})
holidays = pd.concat((playoffs, superbowls))

现在,我想为模型添加假期。

m = NeuralProphet(holidays=holidays)
m.add_country_holidays(country_name='US')
m.fit(df)
  1. 如何将多个国家/地区假期添加到 add_country_holidays (m.add_country_holidays)?
  2. 如何将特定国家/地区的假期添加到假期数据中?
  3. 我是否需要针对国家/地区生成不同的模型?或者,整个数据集的一个模型很好,然后就可以添加回归量。建议是什么?
4

1 回答 1

1

这是一个可能的解决方案:

该程序:

# NOTE 1: tested on google colab

# Un-comment the following (!pip) line if you need to install the libraries 
# on google colab notebook:

#!pip install neuralprophet pandas numpy holidays

import pandas as pd
import numpy as np
import holidays
from neuralprophet import NeuralProphet
import datetime


# NOTE 2: Most of the code comes from:
# https://neuralprophet.com/html/events_holidays_peyton_manning.html

# Context:
# We will use the time series of the log daily page views for the Wikipedia
# page for Peyton Manning (American former football quarterback ) as an example.
# During playoffs and super bowls, the Peyton Manning's wiki page is more frequently
# viewed. We would like to see if countries specific holidays also have an
# influence. 

# First, we load the data:

data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
df = pd.read_csv(data_location + "wp_log_peyton_manning.csv")

# To simulate your case, we add a country_id column filled with random values {1,2}
# Let's assume US=1 and Canada=2

import numpy as np
np.random.seed(0)
df['country_id']=np.random.randint(1,2+1,df['ds'].count())

print("The dataframe we are working on:")
print(df.head())


# We would like to add holidays for US and Canada to see if holidays have an
# influence on the # of daily's views on Manning's wiki page.

# The data in df starts in 2007 and ends in 2016:
StartingYear=2007
LastYear=2016
#  Holidays for US and Canada:
US_holidays = holidays.US(years=[year for year in range(StartingYear, LastYear+1)])
CA_holidays = holidays.CA(years=[year for year in range(StartingYear, LastYear+1)])

holidays_US=pd.DataFrame()
holidays_US['ds']=[]
holidays_US['event']=[]
holidays_CA=pd.DataFrame()
holidays_CA['ds']=[]
holidays_CA['event']=[]
for i in df.index: 
    # Convert date string to datetime object:
    datetimeobj=[int(x) for x in df['ds'][i].split('-')] 
    # Check if the corresponding day is a holyday in the US;
    if  df['country_id'][i]==1 and (datetime.datetime(*datetimeobj) in US_holidays):
        d = {'ds': [df['ds'][i]], 'event': ['holiday_US']}
        df1=pd.DataFrame(data=d)
        # If yes: add to holidays_US
        holidays_US=holidays_US.append(df1,ignore_index=True)
        
    # Check if the corresponding day is a holyday in Canada:
    if  df['country_id'][i]==2 and (datetime.datetime(*datetimeobj) in CA_holidays):
        d = {'ds': [df['ds'][i]], 'event': ['holiday_CA']}
        df1=pd.DataFrame(data=d)
        # If yes: add to holidays_CA
        holidays_CA=holidays_CA.append(df1,ignore_index=True)

# Now we can drop the country_id in df:
df.drop('country_id', axis=1, inplace=True)


print("Days in df that are holidays in the US:")
print(holidays_US.head())
print()
print("Days in df that are holidays in Canada:")
print(holidays_CA.head())


# user specified events
# history events
playoffs = pd.DataFrame({
    'event': 'playoff',
    'ds': pd.to_datetime([
        '2008-01-13', '2009-01-03', '2010-01-16',
        '2010-01-24', '2010-02-07', '2011-01-08',
        '2013-01-12', '2014-01-12', '2014-01-19',
        '2014-02-02', '2015-01-11', '2016-01-17',
        '2016-01-24', '2016-02-07',
    ]),
})

superbowls = pd.DataFrame({
    'event': 'superbowl',
    'ds': pd.to_datetime([
        '2010-02-07', '2012-02-05', '2014-02-02', 
        '2016-02-07',
    ]),
})


# Create the events_df:
events_df = pd.concat((playoffs, superbowls, holidays_US, holidays_CA))

# Create neural network and fit:
# NeuralProphet Object
m = NeuralProphet(loss_func="MSE")
m = m.add_events("playoff")
m = m.add_events("superbowl")
m = m.add_events("holiday_US")
m = m.add_events("holiday_CA")


# create the data df with events
history_df = m.create_df_with_events(df, events_df)

# fit the model
metrics = m.fit(history_df, freq="D")

# forecast with events known ahead
future = m.make_future_dataframe(df=history_df, events_df=events_df, periods=365, n_historic_predictions=len(df))
forecast = m.predict(df=future)


fig = m.plot(forecast)
fig_param = m.plot_parameters()
fig_comp = m.plot_components(forecast)

结果:结果(参见参数图)似乎表明,当一天是假期时,美国和加拿大的观看次数都较少。是否有意义?也许... 度假的人似乎有比浏览 Manning 的 wiki 页面更有趣的事情要做 :-) 我不知道。

程序的输出:

The dataframe we are working on:
           ds       y  country_id
0  2007-12-10  9.5908           1
1  2007-12-11  8.5196           2
2  2007-12-12  8.1837           2
3  2007-12-13  8.0725           1
4  2007-12-14  7.8936           2
Days in df that are holidays in the US:
           ds       event
0  2007-12-25  holiday_US
1  2008-01-21  holiday_US
2  2008-07-04  holiday_US
3  2008-11-27  holiday_US
4  2008-12-25  holiday_US

Days in df that are holidays in Canada:
           ds       event
0  2008-01-01  holiday_CA
1  2008-02-18  holiday_CA
2  2008-08-04  holiday_CA
3  2008-09-01  holiday_CA
4  2008-10-13  holiday_CA

INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
INFO - (NP.config.set_auto_batch_epoch) - Auto-set epochs to 138

88%
241/273 [00:02<00:00, 121.69it/s]

INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.36E-02, min: 1.51E+00

88%
241/273 [00:02<00:00, 123.87it/s]

INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.36E-02, min: 1.63E+00

89%
242/273 [00:02<00:00, 121.58it/s]

INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.62E-02, min: 2.58E+00
INFO - (NP.forecaster._init_train_loader) - lr-range-test selected learning rate: 3.44E-02
Epoch[138/138]: 100%|██████████| 138/138 [00:29<00:00,  4.74it/s, MSELoss=0.012, MAE=0.344, RMSE=0.478, RegLoss=0]

数字:

预测:

在此处输入图像描述

参数:

在此处输入图像描述

成分:

在此处输入图像描述

于 2021-12-16T14:44:33.403 回答