r - Difference between glmnet() and cv.glmnet() in R?

Question

I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code:

# de <- data imported from sql connection        
x <- model.matrix(~.,data = de[,2:7])
y <- (de[,1])
reg <- cv.glmnet(x,y, family = "poisson", alpha = 1)
reg1 <- glmnet(x,y, family = "poisson", alpha = 1)

**Co <- coef(?reg or reg1?,s=???)**

summ <- summary(Co)
c <- data.frame(Name= rownames(Co)[summ$i],
       Lambda= summ$x)
c2 <- c[with(c, order(-Lambda)), ]

The beginning imports a large amount of data from my database in SQL. I then put it in matrix format and separate the response from the predictors.

This is where I'm confused: I can't figure out exactly what the difference is between the glmnet() function and the cv.glmnet() function. I realize that the cv.glmnet() function is a k-fold cross-validation of glmnet(), but what exactly does that mean in practical terms? They provide the same value for lambda, but I want to make sure I'm not missing something important about the difference between the two.

I'm also unclear as to why it runs fine when I specify alpha=1 (supposedly the default), but not if I leave it out?

Thanks in advance!

score 21 · Accepted Answer

glmnet() 是一个 R 包，可用于拟合回归模型、套索模型等。Alpha 参数确定适合的模型类型。当 alpha=0 时，适合 Ridge 模型，如果 alpha=1，则适合 lasso 模型。

cv.glmnet() 执行交叉验证，默认为 10 倍，可以使用 nfolds 进行调整。一个 10 倍的 CV 会将您的观察随机分成 10 个不重叠的组/大约相等大小的折叠。第一个折叠将用于验证集，模型适合 9 个折叠。偏差方差优势通常是使用此类模型验证方法的动机。在 lasso 和 ridge 模型的情况下，CV 有助于选择调整参数 lambda 的值。

在您的示例中，您可以执行 plot(reg) OR reg$lambda.min 来查看导致最小 CV 错误的 lambda 值。然后，您可以导出该 lambda 值的测试 MSE。默认情况下，glmnet() 将对自动选择的 lambda 范围执行 Ridge 或 Lasso 回归，这可能不会给出最低的测试 MSE。希望这可以帮助！

希望这可以帮助！

score 2 · Accepted Answer

在 reg$lambda.min 和 reg$lambda.1se 之间；lambda.min 显然会给你最低的 MSE，但是，取决于你对错误的灵活程度，你可能想要选择 reg$lambda.1se，因为这个值会进一步减少预测变量的数量。您也可以选择 reg$lambda.min 和 reg$lambda.1se 的平均值作为您的 lambda 值。

r - Difference between glmnet() and cv.glmnet() in R?

2 回答 2

Related

Reference