尝试不同的方法,我尝试了这个图:每天的箱线图显示用户消息计数的分布,以及连接每个用户的平均消息数的线。这是目标情节:
我首先使用@Sacha Epskamp 的方法生成数据。我生成了一个大型数据集,以便为预期的情节提供一些东西
library("ggplot2")
library("lubridate")
# This code from Sacha Eskamp
# http://stackoverflow.com/a/10269840/1290420
# Generate a data set
set.seed(1)
start <- strptime("2012-01-05 00:00:00",
format="%Y-%m-%d %H:%M:%S")
end <- strptime("2012-03-05 00:00:00",
format="%Y-%m-%d %H:%M:%S")
df <- data.frame(message.id = 1:10000,
user.id = sample(1:30,10000,
TRUE,
prob=1:30),
message.date = seq(start,
end,
length=10000)
)
然后我努力将数据框整理成适合情节的形状。我相信plyr
大师将能够大大改善这一点。
# Clean up the data frame and add a column
# with combined day-user
df$day <- yday(df$message.date)
df <- df[ df$day!=65, c(2,4) ]
df$day.user <- paste(df$day, df$user.id, sep="-")
# Copy into new data frame with counts for each
# day-user combination
df2 <- aggregate(df,
by=list(df$day,
df$day.user),
FUN="length"
)
df2 <- df2[,c(1,2,3)]
names(df2) <- c("day", "user", "count")
df2$user <- gsub(".+-(.+)", "\\1", df2$user)
然后绘制情节是简单的部分:
p <- ggplot(df2,
aes(x=day,
y=count))
p <- p + geom_boxplot(aes(group=day), colour="grey80")
p <- p + stat_summary(fun.y=mean,
colour="steelblue",
geom="line",
size=1)
p <- p + stat_summary(fun.y=mean,
colour="red",
geom="point",
size=3)
p