我想重复 in 的超参数调整(和alpha
/或lambda
)以避免较小数据集的可变性glmnet
mlr3
在caret
中,我可以这样做"repeatedcv"
因为我真的很喜欢mlr3
家庭包,所以我想用它们来分析。但是,我不确定如何执行此步骤的正确方法mlr3
示例数据
#library
library(caret)
library(mlr3verse)
library(mlbench)
# get example data
data(PimaIndiansDiabetes, package="mlbench")
data <- PimaIndiansDiabetes
# get small training data
train.data <- data[1:60,]
由reprex 包(v1.0.0)于 2021-03-18 创建
caret
使用and接近(调整alpha
and lambda
)"cv"
"repeatedcv"
trControlCv <- trainControl("cv",
number = 5,
classProbs = TRUE,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
# use "repeatedcv" to avoid variability in smaller data sets
trControlRCv <- trainControl("repeatedcv",
number = 5,
repeats= 20,
classProbs = TRUE,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
# train and extract coefficients with "cv" and different set.seed
set.seed(2323)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef1
set.seed(23)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef2
# train and extract coefficients with "repeatedcv" and different set.seed
set.seed(13)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlRCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef3
set.seed(55)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlRCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef4
由reprex 包(v1.0.0)于 2021-03-18 创建
用交叉验证展示不同的系数,用重复的交叉验证展示相同的系数
# with "cv" I get different coefficients
identical(coef1, coef2)
#> [1] FALSE
# with "repeatedcv" I get the same coefficients
identical(coef3,coef4)
#> [1] TRUE
由reprex 包(v1.0.0)于 2021-03-18 创建
第mlr3
一种方法使用cv.glmnet
(内部调整lambda
)
# create elastic net regression
glmnet_lrn = lrn("classif.cv_glmnet", predict_type = "prob")
# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")
# create learner
learner = as_learner(glmnet_lrn)
# train the learner with different set.seed
set.seed(2323)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef1
set.seed(23)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef2
由reprex 包(v1.0.0)于 2021-03-18 创建
通过交叉验证展示不同的系数
# compare coefficients
coef1
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#> 1
#> (Intercept) -3.323460895
#> age 0.005065928
#> glucose 0.019727881
#> insulin .
#> mass .
#> pedigree .
#> pregnant 0.001290570
#> pressure .
#> triceps 0.020529162
coef2
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#> 1
#> (Intercept) -3.146190752
#> age 0.003840963
#> glucose 0.019015433
#> insulin .
#> mass .
#> pedigree .
#> pregnant .
#> pressure .
#> triceps 0.018841557
由reprex 包(v1.0.0)于 2021-03-18 创建
更新 1:我取得的进展
根据下面的评论和这个评论,我可以使用rsmp
和
AutoTuner
这个答案建议不要调整cv.glmnet
但是glmnet
(当时在 ml3 中不可用)
第二种mlr3
方法使用glmnet
(重复调整alpha
和lambda
)
# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")
# create elastic net regression
glmnet_lrn = lrn("classif.glmnet", predict_type = "prob")
# turn to learner
learner = as_learner(glmnet_lrn)
# make search space
search_space = ps(
alpha = p_dbl(lower = 0, upper = 1),
s = p_dbl(lower = 1, upper = 1)
)
# set terminator
terminator = trm("evals", n_evals = 20)
#set tuner
tuner = tnr("grid_search", resolution = 3)
# tune the learner
at = AutoTuner$new(
learner = learner,
rsmp("repeated_cv"),
measure = msr("classif.ce"),
search_space = search_space,
terminator = terminator,
tuner=tuner)
at
#> <AutoTuner:classif.glmnet.tuned>
#> * Model: -
#> * Parameters: list()
#> * Packages: glmnet
#> * Predict Type: prob
#> * Feature types: logical, integer, numeric
#> * Properties: multiclass, twoclass, weights
开放式问题
我如何证明我的第二种方法是有效的,并且我得到不同种子的相同或相似系数?IE。如何提取最终模型的系数AutoTuner
set.seed(23)
at$train(train.task) -> tune1
set.seed(2323)
at$train(train.task) -> tune2
由reprex 包(v1.0.0)于 2021-03-18 创建