r - Glmnet. Different results for the same lambda vector, depending on whether it was calculated by glmnet or passed down as a parameter

Question

Glmnet with ridge regularization calculates coefficients for the first lambda value differently when lambda vector is chosen by glmnet algorithm compared to when it is given in a function call. For example, two models (that I would expect to be identical)

> m <- glmnet(rbind(c(1, 0), c(0, 1)), c(1, 0), alpha=0)
> m2 <- glmnet(rbind(c(1, 0), c(0, 1)), c(1, 0), alpha=0, lambda=m$lambda)

give completely different coefficients:

> coef(m, s=m$lambda[1])
3 x 1 sparse Matrix of class "dgCMatrix"
                        1
(Intercept)  5.000000e-01
V1           1.010101e-36
V2          -1.010101e-36

> coef(m2, s=m2$lambda[1])
3 x 1 sparse Matrix of class "dgCMatrix"
                       1
(Intercept)  0.500000000
V1           0.000998004
V2          -0.000998004

The same happens with different datasets too. When lambda is not provided for glmnet, all coefficients for lambda.max coef(m, s=m$lambda[1]) (except for the intercept) are very close to zero and predictions are equal for any X (due to rounding?).

My questions:

Why is this the case? Is the difference intentional?
How exactly are coefficients for the greatest lambda coef(m, s=m$lambda[1]) determined?

score 6 · Accepted Answer

这是一个棘手的问题。当 alpha=0 时，lambda 的“起始”值（除截距之外的所有系数都为零时的值）为无穷大。由于我们想要生成一个从初始值几何上趋于零的值网格，因此无穷大并没有多大用处。因此，我们将其设为 alpha=0.001（在本例中为 500）时使用的起始值，这是看到的最大 lambda。

因此，在 m 中，系数实际上为零，但报告的最大 lambda 为 500（同时它实际上是无穷大）

在 m2 中，我们实际上为第一个位置生成了 500 处的拟合，并且系数并不完全为零。

为了验证我所说的，请注意随后的系数都匹配。

特雷弗·哈斯蒂

r - Glmnet. Different results for the same lambda vector, depending on whether it was calculated by glmnet or passed down as a parameter

1 回答 1

Related

Reference