Glmnet with ridge regularization calculates coefficients for the first lambda value differently when lambda vector is chosen by glmnet algorithm compared to when it is given in a function call. For example, two models (that I would expect to be identical)
> m <- glmnet(rbind(c(1, 0), c(0, 1)), c(1, 0), alpha=0)
> m2 <- glmnet(rbind(c(1, 0), c(0, 1)), c(1, 0), alpha=0, lambda=m$lambda)
give completely different coefficients:
> coef(m, s=m$lambda[1])
3 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 5.000000e-01
V1 1.010101e-36
V2 -1.010101e-36
> coef(m2, s=m2$lambda[1])
3 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 0.500000000
V1 0.000998004
V2 -0.000998004
The same happens with different datasets too. When lambda is not provided for glmnet, all coefficients for lambda.max coef(m, s=m$lambda[1]) (except for the intercept) are very close to zero and predictions are equal for any X (due to rounding?).
My questions:
- Why is this the case? Is the difference intentional?
- How exactly are coefficients for the greatest lambda coef(m, s=m$lambda[1]) determined?