当我尝试使用rfe
. lrFuncs
我尝试了他们的建议,但他们没有解决我的问题。我们以 caret 包中的 GermanCredit 数据集为例。在这个数据集中,所有的因子(除了目标变量Class
)都已经转换为二进制数值变量,所以我们不用担心使用model.matrix
.
> library(caret)
> data(GermanCredit)
> GCrfe <- rfe(GermanCredit[,c(1:9,11:62)], GermanCredit[,10], sizes=(1:50), rfeControl=rfeControl(functions=lrFuncs))
Error in { :
task 1 failed - "rfe is expecting 61 importance values but only has 48"
好的,那么我查看没有方差的变量(目标变量 Class 除外)并删除没有方差的变量(即只有一个唯一值)。
> variableVariance <- sapply(GermanCredit[-10], function(x) length(unique(x)))
> which(variableVariance==1)
Purpose.Vacation Personal.Female.Single
26 44
> GermanCredit <- GermanCredit[-grep('Purpose.Vacation', names(GermanCredit))]
> GermanCredit <- GermanCredit[-grep('Personal.Female.Single', names(GermanCredit))]
现在我查看相关变量并摆脱“重复”。
> Cor <- abs(cor(GermanCredit[-10]))
> diag(Cor) <- 0
> which(Cor > 0.8, arr.ind=T)
row col
OtherInstallmentPlans.None 52 50
OtherInstallmentPlans.Bank 50 52
> GermanCredit <- GermanCredit[-grep('OtherInstallmentPlans.Bank', names(GermanCredit))]
如果我现在尝试 rfe,我仍然会得到同样的错误。
> GCrfe <- rfe(GermanCredit[,c(1:9,11:59)], GermanCredit[,10], sizes=(1:50), rfeControl=rfeControl(functions=lrFuncs))
Error in { :
task 1 failed - "rfe is expecting 58 importance values but only has 48"
> set.seed(12213)
> index <- createFolds(GermanCredit$Class, k=10, returnTrain=T)
> lrCtrl <- rfeControl(functions=lrFuncs, method='repeatedcv', index=index)
> GCrfe <- rfe(GermanCredit[,c(1:9,11:59)], GermanCredit[,10], sizes=(1:50), rfeControl=lrCtrl)
Error in { :
task 1 failed - "rfe is expecting 58 importance values but only has 48"
对于解决此问题并了解此错误发生原因的任何帮助,我将不胜感激。