Combine them into a single "long" data frame that has a grouping column marking which data frame each row came from.
library(reshape2)
library(dplyr)
# Individual data frames
a = data.frame(a = sample(1:10, 20, replace = T))
b = data.frame(b = sample(1:11, 19, replace = T))
c = data.frame(c = sample(1:9, 21, replace = T))
Combine data frames in "long" format. The data frames have different numbers of rows, so we need our new grouping variable (called data_source
below) to repeat each data frame's name a number of times equal to the number of rows in each data frame. We use the rep
function to take care of this. One way is as follows: rep(c("a","b","c"), times=c(nrow(a), nrow(b), nrow(c)))
, however, I use sapply
below because is seemed cleaner (though perhaps more opaque).
df = data.frame(value =c(a$a,b$b,c$c),
data_source=rep(c("a","b","c"), times=sapply(list(a,b,c), nrow)))
# Pre-summarise counts in order to add zero counts for empty categories
df.summary = df %>% group_by(data_source, value) %>%
tally %>%
dcast(data_source ~ value, value.var="n", fill=0) %>%
melt(id.var="data_source", variable.name="value", value.name="n")
ggplot(df.summary, aes(value, n, fill=data_source)) +
geom_bar(stat="identity", position="dodge", colour="grey20", lwd=0.3)
If we didn't have some categories with zero counts (for example, data frames b
and c
have no values equal to 10), then we could just do this:
ggplot(df, aes(factor(value), fill=data_source)) +
geom_bar(position="dodge", colour="grey20", lwd=0.3)
But then note how ggplot
expands the remaining bars when one or two data frames don't contain a given value: