r - 计算误差条并将其添加到 ggplot2 直方图的好方法是什么？

Question

以下命令生成一个简单的直方图：

g<- ggplot(data = mtcars, aes(x = factor(carb) )) + geom_histogram()

通常我会在我的图中添加错误栏，如下所示：

g+stat_summary(fun.data="mean_cl_boot",geom="errorbar",conf.int=.95)

但这不适用于直方图（“错误：geom_errorbar 需要以下缺失的美学：ymin，ymax”），我认为因为 y 变量不是显式的 - 计数是由 geom_histogram 自动计算的，所以没有声明y 变量。

我们是否无法使用 geom_histogram 而是必须首先自己计算 y 数量（计数），然后通过调用 geom_bar 将其指定为 y 变量？

score 2 · Accepted Answer

似乎确实不能使用 geom_histogram 而是我们必须手动计算计数（条形高度）和置信区间限制。首先，计算计数：

library(plyr)
mtcars_counts <- ddply(mtcars, .(carb), function(x) data.frame(count=nrow(x)))

剩下的问题是计算二项式比例的置信区间，这里是计数除以数据集中的案例总数。文献中提出了多种公式。在这里，我们将使用在 PropCIs 库中实现的 Agresti & Coull (1998) 方法。

library(PropCIs)
numTotTrials <- sum(mtcars_counts$count)

# Create a CI function for use with ddply and based on our total number of cases.
makeAdd4CIforThisHist <- function(totNumCases,conf.int) {
  add4CIforThisHist <- function(df) {
     CIstuff<- add4ci(df$count,totNumCases,conf.int)
     data.frame( ymin= totNumCases*CIstuff$conf.int[1], ymax = totNumCases*CIstuff$conf.int[2] ) 
  }
  return (add4CIforThisHist)
}

calcCI <- makeAdd4CIforThisHist(numTotTrials,.95)

limits<- ddply(mtcars_counts,.(carb),calcCI) #calculate the CI min,max for each bar

mtcars_counts <- merge(mtcars_counts,limits) #combine the counts dataframe with the CIs

g<-ggplot(data =mtcars_counts, aes(x=carb,y=count,ymin=ymin,ymax=ymax)) + geom_bar(stat="identity",fill="grey")
g+geom_errorbar()

结果图

score 1 · Accepted Answer

我不确定你想做的事情在统计上是否有效。

例如，如果我们手动执行汇总（bin/compute），我们会得到NA上层和下层：

mtcars$carb_bin <- factor(cut(mtcars$cyl,8,labels=FALSE))
library(plyr)
mtcars_sum <- ddply(mtcars, "carb_bin", 
                 function(x)smean.cl.boot(length(x$carb)))
mtcars_sum
  carb_bin Mean Lower Upper
1        1   11    NA    NA
2        4    7    NA    NA
3        8   14    NA    NA

即使您只计算 they并将其ggplot2用于 plotgeom_bar和error_bar，您也不会得到 error_bar ，因为 upper 和 lower 定义不明确。

mtcars_sum <- ddply(mtcars, "carb_bin", summarise,
                    y = length(carb))

ggplot(data = mtcars_sum, aes(x=carb_bin,y=y)) + 
  geom_bar(stat='identity',alpha=0.2)+
  stat_summary(fun.data="mean_cl_normal",col='red',
               conf.int=.95,geom='pointrange')

在此处输入图像描述

r - 计算误差条并将其添加到 ggplot2 直方图的好方法是什么？

2 回答 2

Related

Reference