来自原始数据和汇总数据的相同图
对于以下数据结构
dsN<-data.frame(
id=rep(1:100, each=4),
yearF=factor(rep(2001:2004, 100)),
attendF=sample(1:8, 400, T, c(.2,.2,.15,.10,.10, .20, .15, .02))
)
dsN[sample(which(dsN$yearF==2001), 5), "attendF"]<-NA
dsN[sample(which(dsN$yearF==2002), 10), "attendF"]<-NA
dsN[sample(which(dsN$yearF==2003), 15), "attendF"]<-NA
dsN[sample(which(dsN$yearF==2004), 20), "attendF"]<-NA
attcol8<-c("Never"="#4575b4",
"Once or Twice"="#74add1",
"Less than once/month"="#abd9e9",
"About once/month"="#e0f3f8",
"About twice/month"="#fee090",
"About once/week"="#fdae61",
"Several times/week"="#f46d43",
"Everyday"="#d73027")
dsN$attendF<-factor(dsN$attendF, levels=1:8, labels=names(attcol8))
head(dsN,13)
id yearF attendF
1 1 2001 About once/week
2 1 2002 About once/month
3 1 2003 About once/week
4 1 2004 <NA>
5 2 2001 Less than once/month
6 2 2002 About once/week
7 2 2003 About once/week
8 2 2004 Several times/week
9 3 2001 Once or Twice
10 3 2002 About once/week
11 3 2003 <NA>
12 3 2004 Once or Twice
13 4 2001 Several times/week
我们可以得到一系列堆积条形图
require(ggplot2)
# p<- ggplot( subset(dsN,!is.na(attendF)), aes(x=yearF, fill=attendF)) # leaving NA out of
p<- ggplot( dsN, aes(x=yearF, fill=attendF)) # keeping NA in calculations
p<- p+ geom_bar(position="fill")
p<- p+ scale_fill_manual(values = attcol8,
name="Response category" )
p<- p+ scale_y_continuous("Prevalence: proportion of total",
limits=c(0, 1),
breaks=c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1))
p<- p+ scale_x_discrete("Waves of measurement",
limits=as.character(c(2000:2005)))
p<- p+ labs(title=paste0("In the past year, how often have you attended a worship service?"))
p
上图是根据原始数据生成的。但是,有时从汇总数据中生成图表很方便,尤其是在需要控制统计函数的情况下。下面是 dsN 到 ds 的转换,其中仅包含实际映射到上图的值:
require(dplyr)
ds<- dsN %.%
dplyr::filter(!is.na(attendF)) %.%
dplyr::group_by(yearF,attendF) %.%
dplyr::summarize(count = sum(attendF)) %.%
dplyr::mutate(total = sum(count),
percent= count/total)
head(ds,10)
Source: local data frame [10 x 5]
Groups: yearF
yearF attendF count total percent
1 2001 Never 18 373 0.04826
2 2001 Once or Twice 36 373 0.09651
3 2001 Less than once/month 30 373 0.08043
4 2001 About once/month 32 373 0.08579
5 2001 About twice/month 40 373 0.10724
6 2001 About once/week 90 373 0.24129
7 2001 Several times/week 119 373 0.31903
8 2001 Everyday 8 373 0.02145
9 2002 Never 11 355 0.03099
10 2002 Once or Twice 44 355 0.12394
# verify
summarize(filter(ds, yearF==2001), should.be.one=sum(percent))
```
Source: local data frame [1 x 2]
yearF should.be.one
1 2001 1
问题:
如何使用此摘要数据集从上方重新创建图表
ds
?