r - R: GLMNET odd behavior when model is reran

Question

I am trying to use LASSO for variable selection, and attempted the implementation in R using the glmnet package. This is the code I wrote so far:

 set.seed(1)
 library(glmnet)
 return =  matrix(ret.ff.zoo[which(index(ret.ff.zoo) == beta.df$date[1]),])
 data = matrix(unlist(beta.df[which(beta.df$date == beta.df$date[1]),][,-1]), ncol = num.factors)
 dimnames(data)[[2]] <- names(beta.df)[-1]
 model <- cv.glmnet(data, return, standardize = TRUE)
 coef(model)

This is what I obtain when I run it the first time:

 > coef(model)
 15 x 1 sparse Matrix of class "dgCMatrix"
                       1
 (Intercept) 0.009159452
 VAL         .          
 EQ          .          
 EFF         .          
 SIZE        0.018479078
 MOM         .          
 FSCR        .          
 MSCR        .          
 SY          .          
 URP         .          
 UMP         .          
 UNIF        .          
 OIL         .          
 DEI         .          
 PROD        .

BUT, this is what I obtain when I run the SAME code once more:

 > coef(model)
 15 x 1 sparse Matrix of class "dgCMatrix"
                       1
 (Intercept) 0.008031915
 VAL         .          
 EQ          .          
 EFF         .          
 SIZE        0.021250778
 MOM         .          
 FSCR        .          
 MSCR        .          
 SY          .          
 URP         .          
 UMP         .          
 UNIF        .          
 OIL         .          
 DEI         .          
 PROD        .

I am not sure why the model behaves this way. How would I be able to choose a final model if the coefficients change at every run? Does it use a different tuning parameter $\lambda$ at every run? I thought that cv.glmnet uses model$lambda.1se by default?!

I have just started learning about this package, and would appreciate any help I can get!

Thank you!

score 5 · Accepted Answer

5

该模型不是确定性的。在模型拟合之前运行set.seed(1)以产生确定性结果。

于 2013-08-27T13:55:32.350 回答

score 1 · Accepted Answer

您需要为两个模型提供相同的nfolds和foldid两个模型。检查help(cv.glmnet)更多细节。这将使交叉验证是相同的，如果您在相同的数据集上运行模型，您应该得到相同的模型。

score 0 · Accepted Answer

只是对@nograpes 答案的补充。每次拟合模型之前，都应该设置相同的种子。简而言之，一颗种子仅适用于一种型号。例如，

set.seed(1)
model1 = cv.glmnet(x, y, alpha = 0, family = 'binomial')
model2 = cv.glmnet(x, y, alpha = 0, family = 'binomial')

对于上面的代码，model1 和 model2 的系数可能不同。

set.seed(1)
model1 = cv.glmnet(x, y, alpha = 0, family = 'binomial')
set.seed(1)
model2 = cv.glmnet(x, y, alpha = 0, family = 'binomial')

只有在拟合模型之前设置相同的种子后，结果才会完全相同。

r - R: GLMNET odd behavior when model is reran

3 回答 3

Related

Reference