r - R - dplyr 对因素组合的总结

Question

如果我有一个包含 2 个因子（a 和 b）、2 个水平（1 和 2）和 1 个变量（x）的简单数据框，我如何获得 x 的中值：每个因子 a 水平的中值 x因子 b 的水平，以及 a*b 的每个组合？

library(dplyr)    
df <- data.frame(a = as.factor(c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)),
   b = as.factor(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)),
   x = c(runif(16)))

我尝试了各种（许多）版本：

df %>%
   group_by_(c("a", "b")) %>%
   summarize(med_rate = median(df$x))

对于因子 a 的每个水平的中位数 x，结果应如下所示：

中位数
1 0.58811
2 0.53167

对于因子 b 的每个水平的中位数 x 就像这样：

b 中位数
1 0.60622
2 0.46096

对于 a 和 b 的每个组合的中位数 x 就像这样：

ab 中位数
1 1 0.66745
1 2 0.34656
2 1 0.50903
2 2 0.55990

提前感谢您的帮助。

score 0 · Accepted Answer

以下不是很优雅，但创建了一个data.frame满足您预期结果的单曲。

我们正在创建三个数据data.frames（用于 a、b 和 a*b）并将它们组合成一个。

bind_rows(
  df %>% 
    group_by(a) %>% 
    rename(factor_g = a) %>% 
    summarize(med_rate = median(x)),
  df %>% 
    group_by(b) %>% 
    rename(factor = b) %>% 
    summarize(med_rate = median(x)),
  df %>% 
    # We create a column for grouping a*b
    mutate(factor = paste(a, b)) %>% 
    group_by(factor) %>% 
    summarize(med_rate = median(x))
)

score 0 · Accepted Answer

set.seed(123) ##make your example reproducible
require(data.table)
df <- data.table(a = as.factor(c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)),
             b = as.factor(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)),
             x = c(runif(16)))

df[, median(x), by = a]
df[, median(x), by = b]
df[, median(x), by = .(a,b)]

r - R - dplyr 对因素组合的总结

2 回答 2

Related

Reference