r - glmnet 中的汇总统计信息

Question

我一直在研究数据集并使用glmnet进行线性 LASSO/Ridge 回归。

为了简单起见，我们假设我使用的模型如下：

cv.glmnet(train.features, train.response, alpha=1, nlambda=100, type.measure = "mse", nfolds = 10)

我正在为客户准备一个演示文稿，我需要展示变量的 T 统计数据和R 平方值。此外，我还需要根据模型的拟合值绘制残差。

在创建从头开始执行此操作的函数之前，我想问一下库中是否已经涵盖了这一点。我检查了glmnet 小插图，但没有找到任何东西。

谢谢你的帮助！

score 9 · Accepted Answer

您的问题的部分答案： plotmo R 包中的plotres 函数是绘制各种模型（包括glmnet和cv.glmnet模型）的残差的简单方法。包装中包含的plotres 插图有详细信息。例如

library(glmnet)
data(longley)
mod <- glmnet(data.matrix(longley[,1:6]), longley[,7])
library(plotmo) # for plotres
plotres(mod)

给出以下情节。您可以通过将适当的参数传递给 plotres 来选择子图并修改图。

score 0 · Accepted Answer

“yardstick”和“modelr”这两个包可以提供帮助。

我使用插入符号通过“train()”调用 glmnet，返回的对象有一个 $resample 对象，其中包含每个交叉验证折叠的 RMSE、Rsquared 和 MAE。

library( tictoc ) # If you don't want to install this, just take out the calls to tic() and toc()
library( caret )
library( tidyverse )

training_folds <- createFolds( dmv, returnTrain = TRUE )

ctl <- trainControl( method = "cv", number = 5, index = training_folds )
tic()
dmv_pp <- preProcess( dmv, method = c( "nzv", "center", "scale" ))
toc() # This can take a while

dmv_train <- predict( dmv_pp, dmv )
# Using just a subset of the data, because otherwise I run out of memory.
mdl <- train( duration_avg ~ ., data = dmv_train[1:1E4,], trControl = ctl,  method = "glmnet",
              tuneGrid = expand.grid(
                alpha = c( 0, 0.5, 1),
                lambda = c( 0.001, 0.01 )
              )
          )

mdl$resample %>% names()

mdl %>%
    listviewer::jsonedit() # This object should contain $resamples

dmv_train <- dmv_train %>%
    modelr::add_predictions( mdl, var = "predicted_duration_avg" ) # I think this should work with any model that has a predict() method

dmv_train %>%
  yardstick::metrics( duration_avg, predicted_duration_avg )

r - glmnet 中的汇总统计信息

2 回答 2

Related

Reference