r - 在 R 中使用 facet_wrap 规范化 ggplot2 密度

Question

我正在使用数据框制作一系列密度图geom_density，并使用条件显示它facet_wrap，如下所示：

ggplot(iris) + geom_density(aes(x=Sepal.Width, colour=Species, y=..count../sum(..count..))) + facet_wrap(~Species)

当我这样做时，y 轴刻度似乎不代表Species面板中每个的百分比，而是代表所有物种的所有总数据点的百分比。

我的问题是：如何使..count..变量 ingeom_density指Species代每个面板的每组中的项目数，以便面板virginica的 y 轴对应于“virginica数据点的分数”？

另外，有没有办法让 ggplot2 输出它使用的值..count..，sum(..count..)以便我可以验证它使用的是什么数字？

编辑：我误解geom_density了，即使是单个Species，..count../sum(..count..)也不是百分比：

ggplot(iris[iris$Species == 'virginica',]) + geom_density(aes(x=Sepal.Width, colour=Species, y=..count../sum(..count..))) + facet_wrap(~Species)

所以我修改后的问题：我怎样才能让密度图成为每个箱中数据的一部分？我必须使用stat_density这个还是geom_histogram？我只希望 y 轴是数据点的百分比/分数

score 5 · Accepted Answer

不幸的是，您要求 ggplot2 做的是为每个方面定义单独的 y，这在语法上无法执行 AFAIK。

因此，针对您在评论线程中提到您“从根本上只想要一个直方图”，我建议您改用geom_histogramor，如果您偏爱线条而不是条形图，则geom_freqpoly：

ggplot(iris, aes(Sepal.Width, ..count..)) + 
  geom_histogram(aes(colour=Species, fill=Species), binwidth=.2) +
  geom_freqpoly(colour="black", binwidth=.2) +
  facet_wrap(~Species)

在此处输入图像描述

**注意：geom_freqpoly 在我上面的例子中也可以代替 geom_histogram。为了提高效率，我只是在一个情节中添加了两者。

希望这可以帮助。

编辑：好的，我设法找到了一种快速而肮脏的方法来获得你想要的东西。它要求您安装并加载plyr. 提前道歉；就 RAM 使用而言，这可能不是最有效的方法，但它确实有效。

首先，让我们把虹膜公开（我使用 RStudio，所以我习惯于在一个窗口中查看我的所有对象）：

d <- iris

现在，我们可以ddply用来计算属于您的 x 轴的每个唯一测量的个体数量（这里我使用 Sepal.Length 而不是 Sepal.Width，给自己更多的范围，只是为了看到更大的绘制时组之间的差异）。

new <- ddply(d, c("Species", "Sepal.Length"), summarize, count=length(Sepal.Length))

请注意，ddply根据引用的变量自动对输出 data.frame 进行排序。

然后我们可以将 data.frame 划分为每个独特的条件——在鸢尾花的情况下，三个物种中的每一个（我相信有一个更顺畅的方法来解决这个问题，如果你正在使用非常大量的数据，不建议继续创建相同数据帧的子集，因为您可能会最大化您的 RAM）...

set <- new[which(new$Species%in%"setosa"),]
ver <- new[which(new$Species%in%"versicolor"),]
vgn <- new[which(new$Species%in%"virginica"),]

...并ddply再次用于计算属于每个测量的个体比例，但对于每个物种是分开的。

prop <- rbind(ddply(set, c("Species"), summarize, prop=set$count/sum(set$count)),
              ddply(ver, c("Species"), summarize, prop=ver$count/sum(ver$count)),
              ddply(vgn, c("Species"), summarize, prop=vgn$count/sum(vgn$count)))

然后，我们只需将所需的所有内容放入一个数据集中，然后从工作空间中删除所有垃圾。

new$prop <- prop$prop
rm(list=ls()[which(!ls()%in%c("new", "d"))])

我们可以使我们的图形在 y 上具有特定于侧面的比例。请注意，我现在使用的是geom_line因为ddply自动订购了您的 data.frame。

ggplot(new, aes(Sepal.Length, prop)) + 
  geom_line(aes(colour=new$Species)) +
  facet_wrap(~Species)

facet_wrap 具有特定于刻面的比例

# let's check our work. each should equal 50
sum(new$count[which(new$Species%in%"setosa")]) 
sum(new$count[which(new$Species%in%"versicolor")]) 
sum(new$count[which(new$Species%in%"versicolor")])

#... and each of these should equal 1
sum(new$prop[which(new$Species%in%"setosa")]) 
sum(new$prop[which(new$Species%in%"versicolor")]) 
sum(new$prop[which(new$Species%in%"versicolor")])

score 0 · Accepted Answer

也许使用 table() 和 barplot() 你可能能够得到你需要的东西。我仍然不确定这是否是你所追求的......

barplot(table(iris[iris$Species == 'virginica',1]))

使用 ggplot2

tb <- table(iris[iris$Species == 'virginica',1])
tb <- as.data.frame(tb)
ggplot(tb, aes(x=Var1, y=Freq)) + geom_bar()

score 0 · Accepted Answer

0

将参数传递scales='free_y'给facet_wrap()应该可以解决问题。

于 2015-01-22T16:22:48.593 回答

r - 在 R 中使用 facet_wrap 规范化 ggplot2 密度

3 回答 3

Related

Reference