2

我想写一个小函数,通过以蛮力方式测试预测变量的所有子集,然后通过 CV 评估它们的分类性能,我可以在 R 的逻辑回归中使用它来自动选择特征。

令人惊讶的是,我没有找到执行此“所有子集功能选择”的包,因此我想自己实现它。

不幸的是,我有限的 R 知识使我无法编写一个生成给定向量的所有子集的循环,我想知道是否有人能指出我正确的方向

4

3 回答 3

5

Caveat incernor

The bestglm package is what you are after

The function bestglm selects the best subset of inputs for the glm family. The selec- tion methods available include a variety of information criteria as well as cross-validation

The vignette goes through a number of examples.

library(bestglm)
data(SAHeart)
# using Cross valiation for selection
out<-bestglm(SAheart,IC  = 'CV', family=binomial, t = 10)
out
# CVd(d = 373, REP = 10)
# BICq equivalent for q in (0.190525988534159, 0.901583162187443)
# Best Model:
#                   Estimate Std. Error   z value     Pr(>|z|)
# (Intercept)    -6.44644451 0.92087165 -7.000372 2.552830e-12
# tobacco         0.08037533 0.02587968  3.105731 1.898095e-03
# ldl             0.16199164 0.05496893  2.946967 3.209074e-03
# famhistPresent  0.90817526 0.22575844  4.022774 5.751659e-05
# typea           0.03711521 0.01216676  3.050542 2.284290e-03
# age             0.05046038 0.01020606  4.944159 7.647325e-07
于 2013-06-04T23:31:50.313 回答
0

You can use paste() + combn(), e.g.

varnames <- c("a","b","c")
rhs <- unlist( sapply(1:length(varnames),function(k) apply(combn(varnames,k),2,paste,collapse=" + ") ) )
formulae <- as.formula( quote( paste("z ~", rhs) ) )

... but perhaps there is a more elegant way?

于 2013-06-04T23:31:09.537 回答
0

drop1()对您的add1()目的没有帮助吗?他们通常会谨慎地指出,自动特征选择可能并不总是最合适的做法,但我认为您已经对此做出了明智的选择。

于 2013-06-04T23:20:50.797 回答