编辑:对不起,我是社区的新手。我尝试通过示例数据和代码使其更加清晰。
这是数据(dput的输出):
structure(list(`Sample Name` = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22), Type = c("A",
"A", "A", "A", "B", "B", "B", "B", "A", "A", "A", "A", "A", "A",
"A", "B", "B", "B", "B", "B", "B", "B"), Size = c(1, 1, 1, 1,
1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), Height = c(270,
280, 290, 295, 292, 285, 305, 330, 125, 130, 140, 142, 123, 117,
140, 135, 132, 145, 160, 170, 136, 154)), row.names = c(NA, -22L
), class = c("tbl_df", "tbl", "data.frame"))
现在我使用过滤器对数据进行分类。我不确定这是否是明智的做法,但到目前为止有效。首先,两个类别有 1 和 3 两种大小,然后每个大小分为两种类型:A 和 B。所以最后我们有 4 种数据。
SizeOne <- filter (Alldata, Size== "1")
SizeThree <- filter (Alldata, Size== "3")
SizeonA <- filter (SizeOne, Type=="A")
SizeoneB <- filter (SizeOne, Type=="B")
SizeThreeA <- filter (SizeThree, Type=="A")
SizeThreeB <- filter (SizeThree, Type=="B")
现在这是绘制 4 个不同类别的累积概率的代码。然后我习惯stat_function
将高斯分布拟合添加到每个累积图。
p2 = ggplot() +
stat_ecdf(data = SizeOne,aes(x= Height, color=SizeOne$Type),geom = "point", size = 1.2, linetype= "twodash", pad= FALSE)+
stat_ecdf(data = SizeThree,aes(x= Height, color=SizeThree$Type),geom = "point", size = 1 , pad= FALSE)+
scale_color_manual(values = c("#e73a00", "#002ee7"))+
labs(title= "Cumulative probability", y = "Cumulative Probability", x= "Height") +
stat_function(data= SizeThreeB, fun = pnorm, color="#e73a00" , args = list(mean=mean(SizeThreeB$Height), sd=sd(SizeThreeB$Height)))+
stat_function(data= SizeThreeA, fun = pnorm, color="#002ee7" , args = list(mean=mean(SizeThreeA$Height), sd=sd(SizeThreeA$Height)))+
stat_function(data= SizeoneB, fun = pnorm, color="#e73a00" , args = list(mean=mean(SizeoneB$Height), sd=sd(SizeoneB$Height)))+
stat_function(data= SizeonA, fun = pnorm, color="#002ee7" , args = list(mean=mean(SizeonA$Height), sd=sd(SizeonA$Height)))
p2
- 现在我的问题是如何将 99%、95% 和 90%(带)的置信区间添加到高斯拟合中?(不是经验累积的)。
- 其次,如何将误差线添加到累积概率点?(到蓝点和蓝点)
到目前为止我的情节