r - 在 R 中运行回归循环的最佳方法是什么？

Question

假设我有可索引的数据源 X 和 Y，比如矩阵。我想运行一组独立的回归并存储结果。我最初的方法是

results = matrix(nrow=nrow(X), ncol=(2))
for(i in 1:ncol(X)) {
        matrix[i,] = coefficients(lm(Y[i,] ~ X[i,])

}

但是，循环很糟糕，所以我可以用 lapply 来做

out <- lapply(1:nrow(X), function(i) { coefficients(lm(Y[i,] ~ X[i,])) } )

有一个更好的方法吗？

score 6 · Accepted Answer

您肯定在这里过度优化。与模型拟合的过程相比，循环的开销可以忽略不计，因此简单的答案是 - 使用您认为最容易理解的任何方式。我会选择 for 循环，但 lapply 也很好。

score 1 · Accepted Answer

我用 plyr 做这种事情，但我同意这不是处理效率问题，而是你喜欢阅读和写作的问题。

score 0 · Accepted Answer

If you just want to perform straightforward multiple linear regression, then I would recommend not using lm(). There is lsfit(), but I'm not sure it would offer than much of a speed up (I have never performed a formal comparison). Instead I would recommend performing the (X'X)^{-1}X'y using qr() and qrcoef(). This will allow you to perform multivariate multiple linear regression; that is, treating the response variable as a matrix instead of a vector and applying the same regression to each row of observations.

Z # design matrix
Y # matrix of observations (each row is a vector of observations)
## Estimation via multivariate multiple linear regression                    
beta <- qr.coef(qr(Z), Y)
## Fitted values                                                             
Yhat <- Z %*% beta
## Residuals                                                                 
u <- Y - Yhat

In your example, is there a different design matrix per vector of observations? If so, you may be able to modify Z in order to still accommodate this.

r - 在 R 中运行回归循环的最佳方法是什么？

3 回答 3

Related

Reference