1

我和 mlr 一起做一个文本分类任务。我已经编写了一个自定义过滤器,如此处所述

创建自定义过滤器

过滤器按预期工作,但是当我尝试并使用并行化时,我收到以下错误:

Exporting objects to slaves for mode socket: .mlr.slave.options
Mapping in parallel: mode = socket; cpus = 4; elements = 2.
Error in stopWithJobErrorMessages(inds, vcapply(result.list[inds], as.character)) : 
  Errors occurred in 2 slave jobs, displaying at most 10 of them:

00001: Error in parallel:::.slaveRSOCK() : 
  Assertion on 'method' failed: Must be element of set {'anova.test','carscore','cforest.importance','chi.squared','gain.ratio','information.gain','kruskal.test','linear.correlation','mrmr','oneR','permutation.importance','randomForest.importance','randomForestSRC.rfsrc','randomForestSRC.var.select','rank.correlation','relief','rf.importance','rf.min.depth','symmetrical.uncertainty','univariate','univariate.model.score','variance'}.

我从错误中假设我的自定义过滤器需要成为集合中的一个元素才能有机会并行工作,但如果(a)这是可能的,并且(b)如果它没有设法解决是,我该怎么做。

在此先感谢您的帮助,阿扎姆

补充:测试脚本 由于敏感性,我不能让你看到我正在使用的实际脚本/数据,但这个例子重现了我看到的错误。除了自定义特征选择和数据集之外,设置学习器和评估它的步骤与我在“真实”脚本中的步骤相同。与我的实际情况一样,如果您删除了 parallelStartSocket() 命令,则脚本会按预期运行。

我还应该补充一点,在使用 RBF 内核调整 SVM 的超参数时,我已经成功地使用了并行处理(或者至少我没有收到错误):除了 makeParamSet() 定义之外,脚本是相同的。

library(parallelMap)
library(mlr)
library(kernlab)

makeFilter(
  name = "nonsense.filter",
  desc = "Calculates scores according to alphabetical order of features",
  pkg = "mlr",
  supported.tasks = c("classif", "regr", "surv"),
  supported.features = c("numerics", "factors", "ordered"),
  fun = function(task, nselect, decreasing = TRUE, ...) {
    feats = getTaskFeatureNames(task)
    imp = order(feats, decreasing = decreasing)
    names(imp) = feats
    imp
  }
)

# set up svm with rbf kernal
svm.lrn <- makeLearner("classif.ksvm",predict.type = "response")  

# wrap learner with filter
svm.lrn <- makeFilterWrapper(svm.lrn, fw.method = "nonsense.filter")

# define feature selection parameters 

ps.svm = makeParamSet(
  makeDiscreteParam("fw.abs", values = seq(2, 3, 1)) 

)

# define inner search and evaluation strategy
ctrl.svm = makeTuneControlGrid()
inner.svm = makeResampleDesc("CV", iters = 5, stratify = TRUE)

svm.lrn <- makeTuneWrapper(svm.lrn, resampling = inner.svm, par.set = ps.svm, 
                           control = ctrl.svm)

# set up outer resampling
outer.svm <-  makeResampleDesc("CV", iters = 10, stratify = TRUE)

# run it...

parallelStartSocket(2)

run.svm <- resample(svm.lrn, iris.task, 
                    resampling = outer.svm, extract = getTuneResult)

parallelStop()
4

1 回答 1

1

The problem is that makeFilter registers S3 methods, which are not available in separate R processes. You have two options to make this work: either simply use parallelStartMulticore(2) so that everything runs in the same R process, or tell parallelMap about the pieces that need to be present in the other R processes.

There are two parts to the latter. First, use parallelLibrary("mlr") to load mlr everywhere and pull out the definition of the filter into a separate file that can be loaded using parallelSource(). For example:

filter.R:

makeFilter(
  name = "nonsense.filter",
  desc = "Calculates scores according to alphabetical order of features",
  pkg = "mlr",
  supported.tasks = c("classif", "regr", "surv"),
  supported.features = c("numerics", "factors", "ordered"),
  fun = function(task, nselect, decreasing = TRUE, ...) {
    feats = getTaskFeatureNames(task)
    imp = order(feats, decreasing = decreasing)
    names(imp) = feats
    imp
  }
)

main.R:

library(parallelMap)
library(mlr)
library(kernlab)

parallelStartSocket(2)

parallelLibrary("mlr")
parallelSource("filter.R")

# set up svm with rbf kernal
svm.lrn = makeLearner("classif.ksvm",predict.type = "response")  

# wrap learner with filter
svm.lrn = makeFilterWrapper(svm.lrn, fw.method = "nonsense.filter")

# define feature selection parameters 

ps.svm = makeParamSet(
  makeDiscreteParam("fw.abs", values = seq(2, 3, 1)) 

)

# define inner search and evaluation strategy
ctrl.svm = makeTuneControlGrid()
inner.svm = makeResampleDesc("CV", iters = 5, stratify = TRUE)

svm.lrn = makeTuneWrapper(svm.lrn, resampling = inner.svm, par.set = ps.svm, 
                           control = ctrl.svm)

# set up outer resampling
outer.svm =  makeResampleDesc("CV", iters = 10, stratify = TRUE)

# run it...
run.svm = resample(svm.lrn, iris.task, resampling = outer.svm, extract = getTuneResult)

parallelStop()
于 2017-02-11T21:06:27.793 回答