虽然你可以用 (d)plyr 走得更远,但我通过一个简单的 for 循环获得了最大的灵活性(因为我对 do() 不太自信,因为我想要的不仅仅是 coefficients() 的输出,例如)
让我们从加载数据和加载一些包开始:
library(dplyr)
library(broom) # get lm output (coefficients) as a dataframe
library(reshape2) # Melt / DCAST for swapping between wide / long format
data(warpbreaks)
接下来,创建一个我们将要迭代的组列表。
GROUPS <- unique(warpbreaks$tension) # Specify groups, one unique model per group.
在这种情况下,回答您的问题的代码将是这样的:
for (i in 1:length(GROUPS)){
CURRENT_GROUP <- GROUPS[i]
df <- filter(warpbreaks, tension==CURRENT_GROUP) # subset the dataframe
fit <- lm(breaks ~ wool, data = df) # Build a model
coeff <- tidy(fit) # Get a pretty data frame of the coefficients & p values
coeff <- coeff[,c(1,2,5)] # Extract P.Value & Estimate
# Rename (intercept) to INT for pretty column names
coeff[coeff$term=="(Intercept)", ]$term <- "INT"
# Make it into wide format with reshape2 package.
coeff <- coeff %>% melt(id.vars=c("term"))
# Defactor the resulting data.frame
coeff <- mutate(coeff,
variable=as.character(variable),
term=as.character(term))
# Rename for prettier column names later
coeff[coeff$variable=="estimate", ]$variable <- "Beta"
coeff[coeff$variable=="p.value", ]$variable <- "P"
coeff <- dcast(coeff, .~term+variable)[,-1]
rsquared <- summary(fit)$r.squared
# Create a df out of what we just did.
row <- cbind(
data.frame(
group=CURRENT_GROUP,
rsquared=rsquared),
coeff
)
# If first iteration, create data.frame -- otherwise: rowbind
if (i==1){
RESULT_ROW = row
}
else{
RESULT_ROW = rbind(RESULT_ROW, row)
} # End if.
} # End for loop
在编写内部 for 循环代码时,只需通过模拟循环来测试一些东西(首先运行 i <- 1,逐步运行内部循环代码,然后运行 i <- 2 等)。结果数据框:
RESULT_ROW
## group rsquared INT_Beta INT_P woolB_Beta woolB_P
## 1 L 0.26107601 44.55556 9.010046e-08 -16.333333 0.03023435
## 2 M 0.07263228 24.00000 5.991177e-07 4.777778 0.27947884
## 3 H 0.12666292 24.55556 9.234773e-08 -5.777778 0.14719571
我并不是说这比上面描述的答案更好,我只是喜欢这样一个事实,即这让我可以灵活地从任何模型中提取任何东西(不仅仅是具有漂亮的 coefficients() 函数的 lm 模型)。