“yardstick”和“modelr”这两个包可以提供帮助。
我使用插入符号通过“train()”调用 glmnet,返回的对象有一个 $resample 对象,其中包含每个交叉验证折叠的 RMSE、Rsquared 和 MAE。
library( tictoc ) # If you don't want to install this, just take out the calls to tic() and toc()
library( caret )
library( tidyverse )
training_folds <- createFolds( dmv, returnTrain = TRUE )
ctl <- trainControl( method = "cv", number = 5, index = training_folds )
tic()
dmv_pp <- preProcess( dmv, method = c( "nzv", "center", "scale" ))
toc() # This can take a while
dmv_train <- predict( dmv_pp, dmv )
# Using just a subset of the data, because otherwise I run out of memory.
mdl <- train( duration_avg ~ ., data = dmv_train[1:1E4,], trControl = ctl, method = "glmnet",
tuneGrid = expand.grid(
alpha = c( 0, 0.5, 1),
lambda = c( 0.001, 0.01 )
)
)
mdl$resample %>% names()
mdl %>%
listviewer::jsonedit() # This object should contain $resamples
dmv_train <- dmv_train %>%
modelr::add_predictions( mdl, var = "predicted_duration_avg" ) # I think this should work with any model that has a predict() method
dmv_train %>%
yardstick::metrics( duration_avg, predicted_duration_avg )