我试图在一个循环中估计一系列 ARIMA 模型,每次迭代都从一个因变量列表中传入一个不同的因变量。我正在尝试使用该fable
包在 R 中执行此操作。但我似乎无法将列表中的不同变量名传递到 dplyr 管道中。
我有一个看起来像这样的 tsibble:
# A tsibble: 320 x 5 [1Q]
# Key: age, sex [4]
quarter age sex var var_log
<yearqtr> <fct> <fct> <dbl> <dbl>
1 1990 Q1 18-25 male 50 3.91
2 1990 Q2 18-25 male 49.9 3.91
3 1990 Q3 18-25 male 51.1 3.93
4 1990 Q4 18-25 male 52.6 3.96
5 1991 Q1 18-25 male 52.1 3.95
6 1991 Q2 18-25 male 51.4 3.94
7 1991 Q3 18-25 male 52.0 3.95
8 1991 Q4 18-25 male 51.2 3.94
9 1992 Q1 18-25 male 50.8 3.93
10 1992 Q2 18-25 male 51.7 3.95
# ... with 310 more rows
此数据是使用以下代码生成的:
library(zoo)
set.seed(42)
quarter <- as.yearqtr(seq(as.Date("1990-01-01"), by="quarter", length.out = 80), format = "%Y-%m-%d")
age <- c('18-25', 'Over 25')
sex <- c('male', 'female')
df <- expand.grid(quarter, age, sex)
names(df) <- c('quarter', 'age', 'sex')
df$var <- NA
df[df$age=='18-25' & df$sex== 'male', ]$var <- cumsum(c(50, rnorm(n=nrow(df[df$age=='18-25' & df$sex== 'male', ])-1, mean =.1)))
df[df$age=='18-25' & df$sex== 'female', ]$var <- cumsum(c(60, rnorm(n=nrow(df[df$age=='18-25' & df$sex== 'female', ])-1, mean =.2)))
df[df$age=='Over 25' & df$sex== 'male', ]$var <- cumsum(c(50, rnorm(n=nrow(df[df$age=='Over 25' & df$sex== 'male', ])-1, mean = (-.1))))
df[df$age=='Over 25' & df$sex== 'female', ]$var <- cumsum(c(60, rnorm(n=nrow(df[df$age=='Over 25' & df$sex== 'male', ])-1, mean = (-.2))))
df$var_log <- log(df$var)
df <- as_tsibble(df, index=quarter, key=c('age', 'sex'))
我正在尝试编写一个函数,该函数将函数规范列表作为其输入,并循环遍历函数以重复估计模型,如下所示:
select <- dplyr::select
estimate_models <-
function(mdl_list, # A list of a list of model specifications
{
# This function is a single loop for estimating models.
for (i in 1:length(mdl_list)) {
# Extract model information -----------------------------------------------
mdl_name <- mdl_list[[i]][["mdl"]] # Name of model
mdl_type <- sub("_.*","",mdl_name) # Type of model,
mdl_vars_ari <- mdl_list[[i]][["ari"]] # Contains the dependent variables in ARIMA models
# ARIMA model estimation --------------------------------------------------
print(paste0("Estimating ", mdl_name, "..."))
# Estimate the ARIMA model
mdl_vars_ari_enquo <- enquo(mdl_vars_ari)
mdl <- mdl_data %>%
model(arima = ARIMA(!!mdl_vars_ari_enquo)) %>%
forecast(h=28) %>% # Forecast 28 periods ahead
fortify() %>% # Extracts the forecast as a dataframe
filter(.level==95) %>% # Filter results where the confidence level is 95%
mutate(mcv_fnl = exp(!!mdl_vars_ari_enquo), quarter = as.Date(quarter, format="%y%m%d")) %>% # Take the exponent and set type of the quarter column to 'Date'
select(-contains(".")) %>% # Remove extra columns
rbind(mdl_data) # Rbind fitted values to the model data
}
}
包含规范细节的mdl_list
, 看起来像这样:
mdl_list <- list(list(mdl = "arima_model", ari = "var_log", d = "df"))
尝试运行代码时出现以下错误:
model(mdl_data, arima=ARIMA(!!mdl_vars_ari_enquo))
Warning: 4 errors (1 unique) encountered for arima
[4] Could not find an appropriate ARIMA model.
这似乎与ARIMA(!!mdl_vars_ari_enquo)
解析变量名参数的方式有关。传入var_log
工作正常。但是传入mdl_vars_ari
不起作用,我认为是因为 dplyr 的非标准评估。
我在这里阅读了 Hadley Wickham 的指南:https ://dplyr.tidyverse.org/articles/programming.html
但两者都没有quo()
,enquo()
似乎也没有奏效。我也尝试过as.name()
,但无济于事。
如果您需要更多详细信息来回答我的问题,请告诉我。