我想创建一个使用 LASSO 方法(alpha=1 的 glmnet)来选择特征的自定义过滤器 - 即 glmnet 分配非零系数的特征是选定的特征。我想要 glmnet 作为过滤器的原因是这样我就可以在重采样期间将这些选定的特征提供给另一个学习器。
为了避免重新发明轮子,我在过滤器中使用了 makeLearner,但我不确定这是否有效。这就是我所拥有的(现阶段我只对生存数据感兴趣):
makeFilter(
name = "LASSO.surv",
desc = "Use the LASSO method to select features for survival data",
pkg = "glmnet",
supported.tasks = c("surv"),
supported.features = c("numerics", "factors", "ordered"),
fun = function(task, nselect, folds, ...) {
data = getTaskData(task, target.extra = TRUE, recode.target = "surv")
lasso.lrn = makeLearner(cl="surv.cvglmnet", id = "lasso", predict.type="response", alpha = 1, nfolds=folds)
model = train(lasso.lrn, task)
mod = model$learner.model
coef.min =coef(mod, s=mod$lambda.min)
res<-as.matrix(coef.min)[,1]
active.min = which(as.matrix(coef.min) != 0)
res[active.min]
})
然后我使用这样的过滤器:
task = wpbc.task
inner = makeResampleDesc("CV", iters=5, stratify=TRUE) # Tuning
cox.lrn <- makeLearner(cl="surv.coxph", id = "coxph", predict.type="response")
cox.filt.lrn = makeFilterWrapper(
makeLearner(cl="surv.coxph", id = "cox.filt", predict.type="response"),
fw.method="LASSO.surv",
fw.perc=0.5,
folds=5
)
learners = list(cox.lrn, cox.filt.lrn)
benchmark(learners, task, inner, measures=list(cindex), show.info=TRUE)
尽管我意识到我还没有使用参数 fw.perc,但这似乎有效(尽管速度很慢)。过滤后的学习器比单独使用 cox 模型提供更好的结果:
Task: wpbc-example, Learner: coxph
Resampling: cross-validation
Measures: cindex
[Resample] iter 1: 0.5884477
[Resample] iter 2: 0.6355556
[Resample] iter 3: 0.5333333
[Resample] iter 4: 0.5256410
[Resample] iter 5: 0.7142857
Aggregated Result: cindex.test.mean=0.5994527
Task: wpbc-example, Learner: cox.filt.filtered
Resampling: cross-validation
Measures: cindex
[Resample] iter 1: 0.5379061
[Resample] iter 2: 0.6533333
[Resample] iter 3: 0.7022222
[Resample] iter 4: 0.6452991
[Resample] iter 5: 0.6764706
Aggregated Result: cindex.test.mean=0.6430463
task.id learner.id cindex.test.mean
1 wpbc-example coxph 0.5994527
2 wpbc-example cox.filt.filtered 0.6430463
我的问题是 - 可以使用 makeLearner 然后在过滤器中训练该学习器吗?有没有更好的办法?