3

在一项作业中,我们被要求对 CART 模型执行交叉验证。我尝试使用 from 的cvFit函数,cvTools但收到一条奇怪的错误消息。这是一个最小的例子:

library(rpart)
library(cvTools)
data(iris)
cvFit(rpart(formula=Species~., data=iris))

我看到的错误是:

Error in nobs(y) : argument "y" is missing, with no default

traceback()

5: nobs(y)
4: cvFit.call(call, data = data, x = x, y = y, cost = cost, K = K, 
       R = R, foldType = foldType, folds = folds, names = names, 
       predictArgs = predictArgs, costArgs = costArgs, envir = envir, 
       seed = seed)
3: cvFit(call, data = data, x = x, y = y, cost = cost, K = K, R = R, 
       foldType = foldType, folds = folds, names = names, predictArgs = predictArgs, 
       costArgs = costArgs, envir = envir, seed = seed)
2: cvFit.default(rpart(formula = Species ~ ., data = iris))
1: cvFit(rpart(formula = Species ~ ., data = iris))

看起来这y是强制性的cvFit.default。但:

> cvFit(rpart(formula=Species~., data=iris), y=iris$Species)
Error in cvFit.call(call, data = data, x = x, y = y, cost = cost, K = K,  : 
  'x' must have 0 observations

我究竟做错了什么?哪个包可以让我对 CART 树进行交叉验证,而无需自己编写代码?(我太懒了……)

4

2 回答 2

17

caret 包使交叉验证变得轻而易举:

> library(caret)
> data(iris)
> tc <- trainControl("cv",10)
> rpart.grid <- expand.grid(.cp=0.2)
> 
> (train.rpart <- train(Species ~., data=iris, method="rpart",trControl=tc,tuneGrid=rpart.grid))
150 samples
  4 predictors
  3 classes: 'setosa', 'versicolor', 'virginica' 

No pre-processing
Resampling: Cross-Validation (10 fold) 

Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 

Resampling results

  Accuracy  Kappa  Accuracy SD  Kappa SD
  0.94      0.91   0.0798       0.12    

Tuning parameter 'cp' was held constant at a value of 0.2
于 2013-05-23T15:31:30.603 回答
4

最后,我能够让它工作。正如 Joran 所指出的,cost需要调整参数。就我而言,我使用的是 0/1 损失,这意味着我使用了一个简单的函数来评估!=而不是-between yand yHat。此外,predictArgs必须包含c(type='class'),否则predict内部使用的调用将返回概率向量,而不是最可能的分类。总结一下:

library(rpart)
library(cvTools)
data(iris)
cvFit(rpart, formula=Species~., data=iris,
      cost=function(y, yHat) (y != yHat) + 0, predictArgs=c(type='class'))

(这使用了 . 的另一种变体cvFitrpart可以通过设置参数来传递其他args=参数。)

于 2013-05-23T22:08:50.163 回答