r - 我应该如何获得套索模型的系数？

Question

这是我的代码：

library(MASS)
library(caret)
df <- Boston
set.seed(3721)
cv.10.folds <- createFolds(df$medv, k = 10)
lasso_grid <- expand.grid(fraction=c(1,0.1,0.01,0.001))
lasso <- train(medv ~ ., 
               data = df, 
               preProcess = c("center", "scale"),
               method ='lasso',
               tuneGrid = lasso_grid,
               trControl= trainControl(method = "cv", 
                                       number = 10, 
                                       index = cv.10.folds))  

lasso

与线性模型不同，我无法从摘要（套索）中找到套索回归模型的系数。我该怎么做？或者也许我可以使用 glmnet？

score 2 · Accepted Answer

当您使用训练时method="lasso"，来自 elasticnet 的 enet 被称为：

lasso$finalModel$call
elasticnet::enet(x = as.matrix(x), y = y, lambda = 0)

小插图写道：

LARS-EN 算法以与最小二乘拟合相同的计算成本同时为收缩参数的所有值计算完整的弹性净解

在下lasso$finalModel$beta.pure，您拥有所有 16 组系数的系数，对应于下的 16 个 L1 范数值lasso$finalModel$L1norm：

length(lasso$finalModel$L1norm)
[1] 16

dim(lasso$finalModel$beta.pure)
[1] 16 13

您也可以使用 predict 来查看它：

predict(lasso$finalModel,type="coef")
$s
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16

$fraction
 [1] 0.00000000 0.06666667 0.13333333 0.20000000 0.26666667 0.33333333
 [7] 0.40000000 0.46666667 0.53333333 0.60000000 0.66666667 0.73333333
[13] 0.80000000 0.86666667 0.93333333 1.00000000

$mode
[1] "step"

$coefficients
          crim        zn       indus      chas        nox       rm        age
0   0.00000000 0.0000000  0.00000000 0.0000000  0.0000000 0.000000 0.00000000
1   0.00000000 0.0000000  0.00000000 0.0000000  0.0000000 0.000000 0.00000000
2   0.00000000 0.0000000  0.00000000 0.0000000  0.0000000 1.677765 0.00000000
3   0.00000000 0.0000000  0.00000000 0.0000000  0.0000000 2.571071 0.00000000
4   0.00000000 0.0000000  0.00000000 0.0000000  0.0000000 2.716138 0.00000000
5   0.00000000 0.0000000  0.00000000 0.2586083  0.0000000 2.885615 0.00000000
6  -0.05232643 0.0000000  0.00000000 0.3543411  0.0000000 2.953605 0.00000000
7  -0.13286554 0.0000000  0.00000000 0.4095229  0.0000000 2.984026 0.00000000
8  -0.21665925 0.0000000  0.00000000 0.5196189 -0.5933941 3.003512 0.00000000
9  -0.32168140 0.3326103  0.00000000 0.6044308 -1.0246080 2.973693 0.00000000
10 -0.33568474 0.3771889 -0.02165730 0.6165190 -1.0728128 2.967696 0.00000000
11 -0.42820289 0.4522827 -0.09212253 0.6407298 -1.2474934 2.932427 0.00000000
12 -0.62605363 0.7005114  0.00000000 0.6574277 -1.5655601 2.832726 0.00000000
13 -0.88747102 1.0150162  0.00000000 0.6856705 -1.9476465 2.694820 0.00000000
14 -0.91679342 1.0613165  0.09956489 0.6837833 -2.0217269 2.684401 0.00000000
15 -0.92906457 1.0826390  0.14103943 0.6824144 -2.0587536 2.676877 0.01948534

插入符号调整的超参数是最大 L1 范数的分数，因此在您提供的结果中，它将是 1，即 max ：

lasso
The lasso 

506 samples
 13 predictor

Pre-processing: centered (13), scaled (13) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 51, 51, 51, 50, 51, 50, ... 
Resampling results across tuning parameters:

  fraction  RMSE      Rsquared   MAE     
  0.001     9.182599  0.5075081  6.646013
  0.010     9.022117  0.5075081  6.520153
  0.100     7.597607  0.5572499  5.402851
  1.000     6.158513  0.6033310  4.140362

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was fraction = 1.

要获得最佳分数的系数：

predict(lasso$finalModel,type="coef",s=16)
$s
[1] 16

$fraction
[1] 1

$mode
[1] "step"

$coefficients
       crim          zn       indus        chas         nox          rm 
-0.92906457  1.08263896  0.14103943  0.68241438 -2.05875361  2.67687661 
        age         dis         rad         tax     ptratio       black 
 0.01948534 -3.10711605  2.66485220 -2.07883689 -2.06264585  0.85010886 
      lstat 
-3.74733185

score 0 · Accepted Answer

我注意到使用上述方法可能会出现问题，如果一个人定义了自己的超参数调整网格。Predict.enet 似乎强加了自己的网格，该网格通常与为 train() 定义的网格不对应。

如果是这种情况，可以将“mode”参数设置为“fraction”，并提供一个从 train() 输出到“s”参数的分数向量：

predict(lasso$finalModel, type = "coef", mode = "fraction", s = lasso$bestTune)

"S" 也可以是您的最佳调整参数，由 train() 确定：

predict(lasso$finalModel, type = "coef", mode = "fraction", s = as.numeric(lasso$bestTune))

^{由reprex 包（v0.3.0）于 2020-09-11 创建}

r - 我应该如何获得套索模型的系数？

2 回答 2

Related