0

我的数据结构如下:

group_id, months_from_start, perc_total_downloads, experience_ratio
1             1                    1.2                4
1             2                    1.7                6
…
235           1                    6.7                3
235           2                   18                  8
…

大约有 300 个组,每个组有 70 个左右的连续数据元素。

我发布了以下脚本来估计每个组的二阶多项式。

s.1<-lm(xts(s[s$group_id == 1,][,-2], order.by=as.Date(s[s$group_id == 1,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1,][,-2], order.by=as.Date(s[s$group_id == 1,][,2]))$months_from_start, 2, raw=TRUE))
s.235<-lm(xts(s[s$group_id == 235,][,-2], order.by=as.Date(s[s$group_id == 235,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 235,][,-2], order.by=as.Date(s[s$group_id == 235,][,2]))$months_from_start, 2, raw=TRUE))
s.599<-lm(xts(s[s$group_id == 599,][,-2], order.by=as.Date(s[s$group_id == 599,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 599,][,-2], order.by=as.Date(s[s$group_id == 599,][,2]))$months_from_start, 2, raw=TRUE))
s.1111<-lm(xts(s[s$group_id == 1111,][,-2], order.by=as.Date(s[s$group_id == 1111,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1111,][,-2], order.by=as.Date(s[s$group_id == 1111,][,2]))$months_from_start, 2, raw=TRUE))
s.1537<-lm(xts(s[s$group_id == 1537,][,-2], order.by=as.Date(s[s$group_id == 1537,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1537,][,-2], order.by=as.Date(s[s$group_id == 1537,][,2]))$months_from_start, 2, raw=TRUE))

对于这些新变量中的每一个,我都可以发布一个摘要声明来揭示有趣的信息:

> summary(s.44375)

Call:
lm(formula = xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 
44375, ][, 2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 
44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, 
][, 2]))$months_from_start, 2, raw = TRUE))


Residuals:
       Min         1Q     Median         3Q        Max 
-0.0064004 -0.0017315 -0.0002022  0.0012087  0.0078436 


Coefficients: (3 not defined because of singularities)
                                                                                                                                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)                                                                                                                       1.993e-03  1.137e-03   1.753    0.084 .  
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)1.0  7.769e-04  6.707e-05  11.583   <2e-16 ***
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)2.0 -9.258e-06  8.404e-07 -11.017   <2e-16 ***
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)0.1         NA         NA      NA       NA    
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)1.1         NA         NA      NA       NA    
poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)0.2         NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 


Residual standard error: 0.002866 on 69 degrees of freedom
Multiple R-squared: 0.6619,Adjusted R-squared: 0.6521 
F-statistic: 67.53 on 2 and 69 DF,  p-value: < 2.2e-16 

出于我的目的,我需要将此信息转录成表格,从这种格式剪切和粘贴非常繁琐且耗时:

group_id   intercept est  intercept stnd err    intercept t value   …
44375         1.993e-03         1/137e-03           1.753          ...
…

使用传统记数法而不是科学记数法对我来说也很方便,但我想我可以没有它。

我有什么办法可以做到这一点而无需手动剪切和粘贴?

谢谢--sw

4

1 回答 1

2

summary 函数只返回一个 R 列表。例如,

R> x = runif(10);y=runif(10)
R> m = lm(y ~ x)

您感兴趣的部分是第四个元素:

R> summary(m)[[4]]
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.44041     0.1768  2.4911  0.03746
x           -0.05899     0.3143 -0.1877  0.85579

这只是一个矩阵。


以上回答了你的问题,但你的代码让我想哭!特别是,阅读for循环和plyr包。例如,我怀疑最后两行几乎可以满足您的所有需求:

##Load the package and create some data
library(plyr)
dd = data.frame(group_id = sample(1:3, 10, TRUE), x = runif(10), y=runif(10)) 

##Split up dd by group_id and do some regression
dd1 = ddply(dd, .(group_id), summarise, summary(lm(y ~ x))[[4]])

##Label the column names
colnames(dd1)[2:5] = c("Estimate"   "Std. Error" "t value"    "Pr(>|t|)")
于 2012-09-19T14:14:15.013 回答