r - 使用 ggplot2 围绕观察样本的均值/中值建立置信带的更好方法

Question

所以我有一个三列数据框，其中包含 Trials、Ind. Variable、Observation。就像是：

df1<- data.frame(Trial=rep(1:10,5), Variable=rep(1:5, each=10), Observation=rnorm(1:50))

我正在尝试绘制 95% 的 conf。使用以下效率相当低的方法对每个试验的平均值进行间隔：

    b<-NULL
    b$mean<- aggregate(Observation~Variable, data=df1,mean)[,2]
    b$sd  <- aggregate(Observation~Variable, data=df1,sd)[,2]
    b$Variable<- df1$Variable
    b$Observation <- df1$Observation 
    b$ucl <- rep(qnorm(.975, mean=b$mean, sd=b$sd), each=10)
    b$lcl <- rep(qnorm(.025, mean=b$mean, sd=b$sd), each=10)
    b<- as.data.frame(b)
    c <- ggplot(b, aes(Variable, Observation))  
    c + geom_point(color="red") + 
    geom_smooth(aes(ymin = lcl, ymax = ucl), data=b, stat="summary", fun.y="mean")

这是低效的，因为它重复了 ymin、ymax 的值。我已经看过 geom_ribbon 方法，但我仍然需要复制。但是，如果我使用任何类型的平滑，例如 glm，代码会简单得多，没有重复。有没有更好的方法来做到这一点？

参考： 1. R 用 ggplot 绘制置信带 2.用 ggplot2 手动着色置信区间 3. http://docs.ggplot2.org/current/geom_smooth.html

score 10 · Accepted Answer

使用这种方法，我得到与您的方法相同的输出。这是受到ggplot 文档的启发。同样，只要每个x值都有多个点，这将是有意义的。

set.seed(1)
df1 <- data.frame(Trial=rep(1:10,5), Variable=rep(1:5, each=10), Observation=rnorm(1:50))    my_ci <- function(x) data.frame(y=mean(x), ymin=mean(x)-2*sd(x), ymax=mean(x)+2*sd(x))

my_ci <- function(x) data.frame(
  y=mean(x), 
  ymin=mean(x) - 2 * sd(x), 
  ymax=mean(x) + 2 * sd(x)
)
ggplot(df1, aes(Variable, Observation)) + geom_point(color="red") +
  stat_summary(fun.data="my_ci", geom="smooth")

在此处输入图像描述

score 7 · Accepted Answer

该ggplot包附带了用于包中许多汇总功能的Hmisc包装器，包括

mean_cl_normal它根据 t 分布计算置信限，
mean_cl_boot它使用不假设均值分布的引导方法，
mean_sdl它使用标准偏差的倍数（默认 = 2）。

后一种方法与上面的答案相同，但不是95% CL。基于 t 分布的置信限由下式给出：

CL = t × s / √n

其中 t 是 t 分布的适当分位数， s 是样本标准差。比较置信带：

ggplot(df1, aes(x=Variable, y=Observation)) + 
  stat_summary(fun.data="mean_sdl", geom="line", colour="blue")+
  stat_summary(fun.data="mean_sdl", mult=2, geom="errorbar", 
               width=0.1, linetype=2, colour="blue")+
  geom_point(color="red") +
  labs(title=expression(paste(bar(x)," \u00B1 ","2 * sd")))

ggplot(df1, aes(x=Variable, y=Observation)) + 
  geom_point(color="red") +
  stat_summary(fun.data="mean_cl_normal", geom="line", colour="blue")+
  stat_summary(fun.data="mean_cl_normal", conf.int=0.95, geom="errorbar", 
               width=0.1, linetype=2, colour="blue")+
  stat_summary(fun.data="mean_cl_normal", geom="point", size=3, 
               shape=1, colour="blue")+
  labs(title=expression(paste(bar(x)," \u00B1 ","t * sd / sqrt(n)")))

最后，使用旋转最后一个图coord_flip()会生成非常接近 a 的东西Forest Plot，这是汇总数据的标准方法，例如您的数据。

ggplot(df1, aes(x=Variable, y=Observation)) + 
  geom_point(color="red") +
  stat_summary(fun.data="mean_cl_normal", conf.int=0.95, geom="errorbar", 
               width=0.2, colour="blue")+
  stat_summary(fun.data="mean_cl_normal", geom="point", size=3, 
               shape=1, colour="blue")+
  geom_hline(aes(yintercept=mean(Observation)), linetype=2)+
  labs(title="Forest Plot")+
  coord_flip()

r - 使用 ggplot2 围绕观察样本的均值/中值建立置信带的更好方法

2 回答 2

Related

Reference