6

我有一个包含 5 列的数据集(数据框),所有列都包含数值。

我希望为数据集中的每一对运行一个简单的线性回归。

例如,如果列名为A, B, C, D, E,我想运行lm(A~B), lm(A~C), lm(A~D), ...., lm(D~E),...,然后我想将每对的数据与回归线一起绘制。

我对 R 还很陌生,所以我有点想知道如何真正实现这一点。我应该使用ddply吗?还是lapply?我不确定如何解决这个问题。

4

3 回答 3

7

这是使用的一种解决方案combn

 combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE)

例子:

set.seed(1)
DF <- data.frame(A=rnorm(50, 100, 3),
                 B=rnorm(50, 100, 3),
                 C=rnorm(50, 100, 3),
                 D=rnorm(50, 100, 3),
                 E=rnorm(50, 100, 3))

更新:添加@Henrik 建议(见评论)

# only the coefficients
> results <- combn(names(DF), 2, function(x){coefficients(lm(DF[, x]))}, simplify = FALSE)
> vars <- combn(names(DF), 2)
> names(results) <- vars[1 , ] # adding names to identify variables in the reggression
> results
$A
 (Intercept)            B 
103.66739418  -0.03354243 

$A
(Intercept)           C 
97.88341555  0.02429041 

$A
(Intercept)           D 
122.7606103  -0.2240759 

$A
(Intercept)           E 
99.26387487  0.01038445 

$B
 (Intercept)            C 
99.971253525  0.003824755 

$B
 (Intercept)            D 
102.65399702  -0.02296721 

$B
(Intercept)           E 
96.83042199  0.03524868 

$C
(Intercept)           D 
 80.1872211   0.1931079 

$C
(Intercept)           E 
 89.0503893   0.1050202 

$D
 (Intercept)            E 
107.84384655  -0.07620397 
于 2013-09-22T19:13:38.687 回答
2

我建议还查看相关矩阵 ( cor(DF)),这通常是发现变量之间线性关系的最佳方法。相关性与简单线性回归的协方差和斜率密切相关。下面的计算举例说明了这个链接。

样本数据:

set.seed(1)
DF <- data.frame(
  A=rnorm(50, 100, 3),
  B=rnorm(50, 100, 3),
  C=rnorm(50, 100, 3),
  D=rnorm(50, 100, 3),
  E=rnorm(50, 100, 3)
)

回归斜率为cov(x, y) / var(x)

beta = cov(DF) * (1/diag(var(DF)))

            A            B           C           D           E
A  1.00000000 -0.045548503 0.028448192 -0.32982367  0.01800795
B -0.03354243  1.000000000 0.003298708 -0.02489518  0.04501362
C  0.02429041  0.003824755 1.000000000  0.24269838  0.15550116
D -0.22407592 -0.022967212 0.193107904  1.00000000 -0.08977834
E  0.01038445  0.035248685 0.105020194 -0.07620397  1.00000000

截距是mean(y) - beta * mean(x)

colMeans(DF) - beta * colMeans(DF)

             A         B         C         D         E
A 1.421085e-14 104.86992  97.44795 133.38310  98.49512
B 1.037180e+02   0.00000 100.02095 102.85026  95.83477
C 9.712461e+01  99.16182   0.00000  75.38373  84.06356
D 1.226899e+02 102.53263  80.87529   0.00000 109.22915
E 9.886859e+01  96.38451  89.41391 107.51930   0.00000
于 2013-09-22T20:04:26.490 回答
1

用于列combn名的所有组合(在下面的示例中,我假设您只需要两列的组合)和Map运行循环。

使用 R 中的 mtcars 数据的示例:

colc<-names(mtcars)
colcc<-combn(colc,2)
colcc<-data.frame(colcc)
kk<-Map(function(x)lm(as.formula(paste(colcc[1,x],"~",paste(colcc[2,x],collapse="+"))),data=mtcars), as.list(1:nrow(colcc)))

 head(kk)
[[1]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)          cyl  
     37.885       -2.876  


[[2]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)         disp  
   29.59985     -0.04122  


[[3]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)           hp  
   30.09886     -0.06823  


[[4]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)         drat  
     -7.525        7.678  


[[5]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)           wt  
     37.285       -5.344  


[[6]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)         qsec  
     -5.114        1.412  
于 2013-09-22T19:25:54.770 回答