r - ggplot2 - 具有组内比例而不是频率的多组直方图

Question

ExperimentCohort我有三个由一个因素确定的学生群体。对于每个学生，我都有一个LetterGrade，也是一个因素。我想LetterGrade为每个ExperimentCohort. 使用

ggplot(df, alpha = 0.2, 
       aes(x = LetterGrade, group = ExperimentCohort, fill = ExperimentCohort))                                                                                                                                                       
  + geom_bar(position = "dodge")

让我非常接近，但三个ExperimentCohorts学生的人数不同。为了在更均匀的领域比较这些，我希望 y 轴是每个字母等级的队列内比例。到目前为止，在绘制之前没有计算这个比例并将其放入单独的数据框中，我还没有找到一种方法来做到这一点。

关于 SO 和其他地方的类似问题的每个解决方案都涉及aes(y = ..count../sum(..count..))，但是 sum(..count..) 是在整个数据帧中执行的，而不是在每个队列中执行的。有人有建议吗？这是创建示例数据框的代码：

df <- data.frame(ID = 1:60, 
        LetterGrade = sample(c("A", "B", "C", "D", "E", "F"), 60, replace = T),
        ExperimentCohort = sample(c("One", "Two", "Three"), 60, replace = T))

谢谢。

score 24 · Accepted Answer

错误的解决方案

您可以使用stat_bin()和y=..density..获得每组的百分比。

ggplot(df, alpha = 0.2,
      aes(x = LetterGrade, group = ExperimentCohort, fill = ExperimentCohort))+
      stat_bin(aes(y=..density..), position='dodge')

更新 - 正确的解决方案

正如@rpierce 所指出的，y=..density..将计算每组的密度值而不是百分比（它们不一样）。

要获得正确的百分比解决方案，一种方法是在绘图之前计算它们。对于ddply()库中的这个使用过的函数plyr。在每个ExperimentCohort使用函数计算的比例中prop.table()，table()并将它们保存为prop. 带着names()又table()回来了LetterGrade。

df.new<-ddply(df,.(ExperimentCohort),summarise,
              prop=prop.table(table(LetterGrade)),
              LetterGrade=names(table(LetterGrade)))

 head(df.new)
  ExperimentCohort       prop LetterGrade
1              One 0.21739130           A
2              One 0.08695652           B
3              One 0.13043478           C
4              One 0.13043478           D
5              One 0.30434783           E
6              One 0.13043478           F

现在使用这个新的数据框进行绘图。由于已经计算了比例 - 将它们作为y值提供并添加stat="identity"到geom_bar.

ggplot(df.new,aes(LetterGrade,prop,fill=ExperimentCohort))+
  geom_bar(stat="identity",position='dodge')

在此处输入图像描述

score 9 · Accepted Answer

您还可以通过weight为每个组创建一个总和为 1 的列来执行此操作：

ggplot(df %>%
         group_by(ExperimentCohort) %>%
         mutate(weight = 1 / n()),
       aes(x = LetterGrade, fill = ExperimentCohort)) +
  geom_histogram(aes(weight = weight), stat = 'count', position = 'dodge')

score 1 · Accepted Answer

我最近尝试过这个并收到一个调用 ddply: 的错误Column prop must be length 1 (a summary value), not 6。在 ddply 上花了一些时间，但无法完全让解决方案发挥作用，所以我提供了一个替代方案（请注意，这仍然使用plyr）：

df.new <- df2 %>% 
    group_by(ExperimentCohort,LetterGrade) %>% 
    summarise (n = n()) %>%
    mutate(freq = n / sum(n))

然后你可以像@didzis-elferts 提到的那样绘制它：

ggplot(df.new,aes(LetterGrade,freq,fill=ExperimentCohort))+
    geom_bar(stat="identity",position='dodge')

r - ggplot2 - 具有组内比例而不是频率的多组直方图

3 回答 3

错误的解决方案

更新 - 正确的解决方案

Related

Reference