r - 在分组变量上平均 geom_density(y=..count..)

Question

我使用以下方法绘制一些分布：

geom_density(aes(my.variable,
color=my.factor,
group=my.replicates,
y=..count..))

考虑到我在 my.factor 的每个级别内没有相同数量的复制 -> 我不能只删除'group' 参数，因为 ..count.. 取决于复制的数量。因此，我想要类似 ..count../number of replicates 之类的东西

这是上下文和可重现的示例

我在 2 个栖息地（a 和 b）进行了采样：鱼的数量和每个个体的体型。我在栖息地之间进行了不同的采样工作。（ra 和 rb 分别是在栖息地 a 和 b 内采样的重复数量）我对栖息地之间在鱼类丰度和体型方面的平均差异感兴趣。但是，我不知道如何处理我没有相同数量的副本这一事实。

数据

#number of replicat
ra=4;rb=6
#number of individuals (lambda of poisson distribution)
na=30;nb=60
#size of individuals (lambda of poisson distribution)
sa=90;sb=80

#data for habitat a
dfa=data.frame()
for (ri in 1:ra){
  habitat="a"
  nb_rep=ra
  replicat=paste("r",ri,sep="")
  size=rpois(rpois(1,na),sa)
  dfa=rbind.data.frame(dfa,data.frame(habitat,nb_rep,replicat,size))
}
#data for habitat b
dfb=data.frame()
for (ri in 1:rb){
  habitat="b"
  nb_rep=rb
  replicat=paste("r",ri,sep="")
  size=rpois(rpois(1,nb),sb)
  dfb=rbind.data.frame(dfb,data.frame(habitat,nb_rep,replicat,size))
}
#whole data set
df=rbind(dfa,dfb)

地块

require(ggplot2)
summary(df)

密度

ggplot(df,aes(size,color=habitat))+
geom_density(aes(y=..density..))

数数

ggplot(df,aes(size,color=habitat))+
geom_density(aes(y=..count..))

但是，如果没有以相同的努力对栖息地进行采样，即不同数量的重复，这是有偏见的

计数，考虑不同的重复

ggplot(df,aes(size,color=habitat,group=paste(habitat,replicat)))+
geom_density(aes(y=..count..))

从最后一个图中，如何获得重复的平均线？谢谢

score 2 · Accepted Answer

我不认为你可以在ggplot. 您可以自己计算密度，然后绘制计算的密度。下面我通过复制你已经拥有的情节来展示它确实有效ggplot(df,aes(size,color=habitat)) + geom_density(aes(y=..count..))。

require(plyr)
# calculate the density
res <- dlply(df, .(habitat), function(x) density(x$size))
dd <- ldply(res, function(z){
  data.frame(size = z[["x"]], 
             count = z[["y"]]*z[["n"]])
})
# these two plots are essentially the same. 
ggplot(dd, aes(size, count, color=habitat)) + 
  geom_line()
ggplot(df,aes(size,color=habitat))+
  geom_density(aes(y=..count..))

现在是稍微困难的任务，即平均不同复制的密度。

# calculate the density 
res <- dlply(df, .(habitat), function(dat){
  lst <- dlply(dat, .(replicat), function(x) density(x$size, 
                                                     # specify to and from based on dat, not x. 
                                                     from=min(dat$size), 
                                                     to=max(dat$size)
  ))
  data.frame(size=lst[[1]][["x"]], 
             #count=colMeans(laply(lst, function(a) a[["y"]]), na.rm=TRUE)*nrow(dat),
             count=colMeans(laply(lst, function(a) a[["y"]]), na.rm=TRUE)*nrow(dat)/nlevels(droplevels(dat$replicat)), 

             habitat=dat$habitat[1])
})
dd <- rbindlist(res)
ggplot(dd, aes(size, count, color=habitat)) + 
  geom_line()

r - 在分组变量上平均 geom_density(y=..count..)

这是上下文和可重现的示例

密度

数数

计数，考虑不同的重复

1 回答 1

Related

Reference