10

我正在使用 R 中的 glmnet 和 caret 包在广义线性模型上运行弹性网络。

我的响应变量是成本(其中成本 > 0 美元),因此我想为我的 GLM 指定一个带有日志链接的高斯族。但是 glmnet 似乎不允许我指定 (link="log")如下:

> lasso_fit <- glmnet(x, y, alpha=1, family="gaussian"(link="log"), lambda.min.ratio=.001)

我尝试了不同的变体,带引号和不带引号,但没有运气。glmnet 文档没有讨论如何包含日志链接。

我错过了什么吗?是否family="gaussian"已经隐含地假设了一个日志链接?

4

2 回答 2

5

It is a bit confusing, but the family argument in glmnet and glm are quite different. In glm, you can specify a character like "gaussian", or you can specify a function with some arguments, like gaussian(link="log"). In glmnet, you can only specify the family with a character, like "gaussian", and there is no way to automatically set the link through that argument.

The default link for gaussian is the identity function, that is, no transformation at all. But, remember that a link function is just a transformation of your y variable; you can just specify it yourself:

glmnet(x, log(y), family="gaussian")

Also note that the default link for the poisson family is log, but the objective function will change. See the Details under ?glmnet in the first couple of paragraphs.


Your comments have led me to rethink my answer; I have evidence that it is not correct.

As you point out, there is a difference between E[log(Y)] and log(E[Y]). I think what the above code does is to fit E[log(Y)], which is not what you want. Here is some code to generate data and confirm what you noted in the comments:

# Generate data
set.seed(1)
x <- replicate(3,runif(1000))
y <- exp(2*x[,1] + 3*x[,2] + x[,3] + runif(1000))
df <- data.frame(y,x)

# Run the model you *want*
glm(y~., family=gaussian(link="log"), data=df)$coef
# (Intercept)          X1          X2          X3 
#   0.4977746   2.0449443   3.0812333   0.9451073 

# Run the model you *don't want* (in two ways)    
glm(log(y)~., family=gaussian(link='identity'), data=df)$coef
# (Intercept)          X1          X2          X3 
#   0.4726745   2.0395798   3.0167274   0.9957110 
lm(log(y)~.,data=df)$coef
# (Intercept)          X1          X2          X3 
#   0.4726745   2.0395798   3.0167274   0.9957110 

# Run the glmnet code that I suggested - getting what you *don't want*.
library(glmnet)
glmnet.model <- glmnet(x,log(y),family="gaussian", thresh=1e-8, lambda=0)
c(glmnet.model$a0, glmnet.model$beta[,1])
#        s0        V1        V2        V3 
# 0.4726745 2.0395798 3.0167274 0.9957110 
于 2014-08-08T16:01:27.763 回答
2

我知道这是一个老问题,但在当前版本的 (4.0-2),可以使用 glm 系列函数作为“family”而不是字符串的参数,因此您可以使用:

glmnet(x, y, family=gaussian(link="log"))

请注意,当您使用字符串参数时,包会更快。

参考: https ://glmnet.stanford.edu/articles/glmnetFamily.html

于 2020-09-09T13:20:06.777 回答