6

全世界的先知用户,希望一切都好。我在一个特定的用例上遇到了一些困难,我将尝试使用下面的一些示例数据和代码来说明。首先让我们生成一些示例数据,以便更容易理解我在说什么。

library(data.table)
library(prophet)
library(dplyr)

# one year of months to be used for generating predictions
ds = c('2016-01-01', '2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-01','2016-07-01','2016-08-01','2016-09-01','2016-10-01','2016-11-01','2016-12-01' )

# historical customer counts
y = c (78498,12356,93732,5556,410,10296,9779,744,16407,100484,23954,141398,10575,850,16334,17496,1643,28074,93181,
       18770,129968,11590,850,16738,17510,1376,27931,94369,18444,134850,13386,919,19075,18050,1565,31296,112094,27995,
       167094,13402,1422,22766,20072,2340,37863,87346,16180,119863,7691,725,16931,12163,1241,25872,87455,16322,116390,
       6994,620,13524,11059,990,22188,105473,23652,154145,13520,1008,18857,19209,1632,31105,102252,21284,138779,11670,
       918,16078,16679,1257,26755,115033,22415,139835,13965,936,18027,18642,1407,28622,155371,40556,174321,25119,1859,
       35326,28844,2962,51582,108817,19158,109864,8693,756,14358,13390,1091,21419)

# the segment channels of the customers
segment_channel = c('Existing_Omni', 'Existing_Retail', 'Existing_Direct', 'NTB_Omni', 'NTB_Retail', 'NTB_Direct', 'React_Omni', 'React_Retail', 'React_Direct')

# an external regressor to be added to the model (in my data there are like 40 of these regressor variables that I would like too add)
flash_sale = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
               2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3)

fake_data = merge(ds,segment_channel, all.y=TRUE)
setnames(fake_data, 'x', 'ds')
setnames(fake_data, 'y', 'segment_channel')
nrow(fake_data) # should be 108 rows, the 9 customer segements for each of the months in 2016

# next join the known customer counts, let's say we have them for the first 8 months of the year

fake_data = cbind(fake_data, y)
fake_data = cbind(fake_data, flash_sale)

# set some of the y values to NA so we can pretend we are trying to predict them using the ds time series as well as the flash sale values,
# which will be known in advance

fake_data = as.data.table(fake_data)
fake_data$ds = as.Date(fake_data$ds)
fake_data[, y := ifelse(ds >= '2016-08-01', NA, y)]

此代码将生成一个与我正在处理的问题非常相似的数据集,因此希望您能够重现我正在做的事情。从本质上讲,我希望能够用这些数据做两件事。第一个是相当直接的,我希望能够明显地添加一个回归量(如本例中的 flash_sale 到我创建的先知模型。我可以很容易地做到这一点,如下所示:

christ <- tibble(
  holiday = 'christ',
  ds = as.Date(c('2016-11-01', '2017-11-01', '2018-11-01',
                 '2019-11-01')),
  lower_window = 0,
  upper_window = 1
)

nye <- tibble(
  holiday = 'nye',
  ds = as.Date(c('2016-11-01', '2017-12-01', '2018-11-01',
                 '2019-11-01')),
  lower_window = 0,
  upper_window = 1
)

holidays <- bind_rows(nye, christ)

m <- prophet(holidays = holidays)
m<- add_regressor(m, name = "flash_sale")
m <- fit.prophet(m, fake_data)
forecast <- predict(m, fake_data)


prophet_plot_components(m, forecast)

这应该会生成一个相当丑陋的图,但很容易看出,给定数据,这应该能够做到这一点,我可以添加多行来添加额外的回归量。好的,所以到目前为止我们都很好。但另一个问题是我正在处理 9 个细分渠道,我不想为每个渠道建立单独的模型。幸运的是,我找到了一个很好的关于堆栈溢出的链接,它完成了分组预言机预测:Using Prophet Package to Predict By Group in Dataframe in R

fcst = fake_data %>%  
  group_by(segment_channel) %>%
  do(predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034), make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>% 
  dplyr::select(ds, segment_channel, yhat)

fcst
> fcst
# A tibble: 207 x 3
# Groups:   segment_channel [9]
   ds                  segment_channel   yhat
   <dttm>              <fct>            <dbl>
 1 2016-01-01 00:00:00 Existing_Direct 38712.
 2 2016-02-01 00:00:00 Existing_Direct 40321.
 3 2016-03-01 00:00:00 Existing_Direct 42648.
 4 2016-04-01 00:00:00 Existing_Direct 45130.
 5 2016-05-01 00:00:00 Existing_Direct 46580.
 6 2016-06-01 00:00:00 Existing_Direct 49437.
 7 2016-07-01 00:00:00 Existing_Direct 50651.
 8 2016-08-01 00:00:00 Existing_Direct 52685.
 9 2016-09-01 00:00:00 Existing_Direct 54719.
10 2016-10-01 00:00:00 Existing_Direct 56687.
# ... with 197 more rows

这或多或少正是我想要的!凉爽的。所以现在我所要做的就是弄清楚如何得到我的分组预测和我的回归量一步加起来。我知道我可以在 do 中包含多行语句,所以这是我为了让它工作而尝试的:

> fcst = fake_data %>%  
+   group_by(segment_channel) %>%
+   do(
+     predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034), 
+     add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+     fit.prophet(prophet(. , holidays = holidays)),
+     make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>% 
+   dplyr::select(ds, segment_channel, yhat)
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 4
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 4
Error in add_regressor(prophet(., holidays = holidays), name = "flash_sale") : 
  Regressors must be added prior to model fitting.

该死。看起来它正在运行,但是关于我如何尝试添加回归量的一些事情并不符合犹太教规。接下来我尝试了这种方式:

> fcst = fake_data %>%  
+   group_by(segment_channel) %>%
+   do(
+     prophet(holidays = holidays),
+     add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+     fit.prophet(prophet(. , holidays = holidays)),
+     predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+     make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>% 
+   dplyr::select(ds, segment_channel, yhat)
Error: Can only supply one unnamed argument, not 4
Call `rlang::last_error()` to see a backtrace
> fcst = fake_data %>%  
+   group_by(segment_channel) %>%
+   do(
+     add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+     fit.prophet(prophet(. , holidays = holidays)),
+     predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+     make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>% 
+   dplyr::select(ds, segment_channel, yhat)
Error: Can only supply one unnamed argument, not 3
Call `rlang::last_error()` to see a backtrace

在这一点上我非常困惑,所以我只是希望互联网上的一些东西可能知道我需要去哪里的正确咒语。

4

0 回答 0