r - 尝试对 gamlss 模型进行交叉验证时出错

Question

我正在尝试对我用 gamlss 包估计的模型执行 5 折交叉验证。当我使用相同的代码并估计另一个模型（例如 OLS）时，我没有问题。但是，当我将模型更改为 gamlss 时，我收到一条错误消息。

这是一个说明性示例：

# load packages and data
library(caret)
library(gamlss)
data(usair)

# create 5 folds
folds <- createFolds(usair$y, k = 5)

当我运行这段代码时，一切正常，我得到一个列表，其中包含我对每个折叠的性能度量：

### 1) OLS
# estimate model 5 times and get performance measures
res1 <- lapply(folds, function(x) {
  # Create training and test data set
  trainset <- usair[-x, ]
  testset <- usair[x, ]
  # estimate the model with the training data set
  m1<- lm(y~ x1 + x2 + x3 + x4 + x5 + x6,
              data=trainset)
  # predict outcomes with the test data set
  y_pred <- predict(m1, newdata = testset)
  # store the actual outcome values in a vector
  y_true <- testset$y
  # Store performance measures
  MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
  MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
  MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
  R2 <- 1-MSE/var(y_true)
  list(MAE=MAE,
       MSE=MSE,
       MAPE=MAPE,
       R2= R2)
})

但是，当我运行此代码并将模型类型更改为 gamlss 时，我收到一条错误消息：

### 2) gamlss
# estimate model 5 times and get performance measures
res2 <- lapply(folds, function(x) {
  # Create training and test data set
  trainset <- usair[-x, ]
  testset <- usair[x, ]
  # estimate the model with the training data set
m1<- gamlss(y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
            data=trainset)
# predict outcomes with the test data set
y_pred <- predict(m1, newdata = testset)
# store the actual outcome values in a vector
y_true <- testset$y
# Store performance measures
MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
R2 <- 1-MSE/var(y_true)
list(MAE=MAE,
     MSE=MSE,
     MAPE=MAPE,
     R2= R2)
})

错误消息是：“评估错误（替代（数据））：找不到对象'trainset'”。我已经为每个折叠分别运行函数中的代码并且它可以工作。似乎无法再创建训练集和测试集了。然而，我所做的只是改变模型。

有谁知道这里可能是什么问题？

score 1 · Accepted Answer

您需要像这样指定公式参数：

res2 <- lapply(folds, function(x) {
  # Create training and test data set
  trainset <- usair[-x, ]
  testset <- usair[x, ]
  # estimate the model with the training data set
  m1<- gamlss(formula=y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
              data=trainset)
  # predict outcomes with the test data set
  y_pred <- predict(m1, newdata = testset)
  # store the actual outcome values in a vector
  y_true <- testset$y
  # Store performance measures
  MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
  MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
  MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
  R2 <- 1-MSE/var(y_true)
  list(MAE=MAE,
       MSE=MSE,
       MAPE=MAPE,
       R2= R2)
})

结果：

> # estimate model 10 times and get performance measures
> res2 <- lapply(folds, function(x) {
+   # Create training and test data set
+   trainset <- usair[-x, ]
+   testset <- usair[x, ]
+   # estimate the model with the training data set
+   m1<- gamlss(formula=y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
+               data=trainset)
+   # predict outcomes with the test data set
+   y_pred <- predict(m1, newdata = testset)
+   # store the actual outcome values in a vector
+   y_true <- testset$y
+   # Store performance measures
+   MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
+   MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
+   MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
+   R2 <- 1-MSE/var(y_true)
+   list(MAE=MAE,
+        MSE=MSE,
+        MAPE=MAPE,
+        R2= R2)
+ })
GAMLSS-RS iteration 1: Global Deviance = 281.937 
GAMLSS-RS iteration 2: Global Deviance = 281.9348 
GAMLSS-RS iteration 3: Global Deviance = 281.9348

r - 尝试对 gamlss 模型进行交叉验证时出错

1 回答 1

Related

Reference