2

我正在尝试使用 MAPE 作为衡量模型性能的指标。

在 LOOCV 和并行执行的情况下,一切正常,但如果我使用另一种重采样方法,我会收到此错误:

{ 中的错误:任务 1 失败 - “找不到函数”mape“”</p>

相反,在串行执行中,这个问题消失了。

下面的代码提供了一个示例。

    library(caret)
    library(doParallel)

    data("environmental")

    registerDoParallel(makeCluster(detectCores(), outfile = ''))



    mape <- function(y, yhat) mean(abs((y - yhat)/y))

    mapeSummary <- function (data, lev = NULL, model = NULL) {

                       out <- mape(data$obs, data$pred)
                       names(out) <- "MAPE"

                       out
                     }



    #LOOCV - parallel
    trControlLoocvPar <- trainControl(allowParallel = T,
                                      verboseIter = T, 
                                      method = "LOOCV",
                                      summaryFunction = mapeSummary)

    #LOOCV - serial
    trControlLoocvSer <- trainControl(allowParallel = F,
                                      verboseIter = T, 
                                      method = "LOOCV",
                                      summaryFunction = mapeSummary)

    #Bootstrapping - parallel
    trControlBootPar <- trainControl(allowParallel = T,
                                      verboseIter = T, 
                                      method = "boot",
                                      summaryFunction = mapeSummary)

    #Bootstrapping - serial
    trControlBootSer <- trainControl(allowParallel = F,
                                      verboseIter = T, 
                                      method = "boot",
                                      summaryFunction = mapeSummary)


    trControlList <- list(trControlLoocvSer, 
                          trControlLoocvPar,
                          trControlBootSer,
                          trControlBootPar)


    models <- lapply(trControlList, 
                     function(control) {

                       train(y = environmental$ozone,
                       x = environmental[, -1], 
                       method = "glmnet", 
                       trControl = control, 
                       metric = "MAPE", 
                       maximize = FALSE)
                     })

我的操作系统是 El Capitan 10.11.4,插入符号的版本是 6.0.62。

4

1 回答 1

2

如消息所述,您的并行进程找不到 mape 函数。

最简单的解决方案是将 mape 函数放在 mapeSummary 函数中,如下所示。然后您的并行进程将正常工作。

mapeSummary <- function (data, lev = NULL, model = NULL) {
  mape <- function(y, yhat) mean(abs((y - yhat)/y))
  out <- mape(data$obs, data$pred)
  names(out) <- "MAPE"

  out
}

奖金:

您还可以使用clusterEvalQclusterApply 功能之一的功能。这如下所示,但不是最优雅的解决方案,需要更多输入:

cl <- makePSOCKcluster(detectCores()-1)
clusterEvalQ(cl, mape <- function(y, yhat) mean(abs((y - yhat)/y)))
registerDoParallel(cl)

mapeSummary <- function (data, lev = NULL, model = NULL) {
   out <- mape(data$obs, data$pred)
  names(out) <- "MAPE"
  out
}

#Bootstrapping - parallel
trControlBootPar <- trainControl(allowParallel = T,
                                 verboseIter = T, 
                                 method = "boot",
                                 summaryFunction = mapeSummary)

train(y = environmental$ozone,
      x = environmental[, -1], 
      method = "glmnet", 
      trControl = trControlBootPar, 
      metric = "MAPE", 
      maximize = FALSE)

stopCluster(cl)
registerDoSEQ()
于 2016-05-04T12:56:05.620 回答