4

I have a dataset that contains 4 different events-types (A, B, C, D) that happens a lot of times daily. I have such a log for over a year. The "EventType" attribute is a 'factor'.

For eg, my dataset looks like this:

DateTime,EventType
6/5/2013 9:35,B
6/5/2013 9:35,A
6/5/2013 9:35,B
6/5/2013 9:36,D
6/5/2013 9:39,A
6/5/2013 9:40,B
7/5/2013 9:35,B
7/5/2013 9:35,A
7/5/2013 9:35,B
7/5/2013 9:36,D
7/5/2013 9:39,A
7/5/2013 9:40,B
8/5/2013 9:35,A
8/5/2013 9:35,A
8/5/2013 9:35,B
8/5/2013 9:36,B
8/5/2013 9:39,A
8/5/2013 9:40,B
9/5/2013 9:35,B
9/5/2013 9:35,B
9/5/2013 9:35,B
9/5/2013 9:36,D
9/5/2013 9:39,A
9/5/2013 9:40,A

I want to plot the total-count of all the event-types on a daily basis. x-axis: date-time, Y-axis: count.

I like to try ddply to accomplish this, but, I am not very sure how to go about it. This is what I have done:

data <- read.csv("C:/analytics/mydata.csv", sep=",", header=TRUE)
k <- ddply(data, "data$DateTime", function(x) count = nrow(x))

The above gives the following output:

       data$DateTime V1
1  6/5/2013 9:35,A  1
2  6/5/2013 9:35,B  2
3  6/5/2013 9:36,D  1
4  6/5/2013 9:39,A  1
5  6/5/2013 9:40,B  1
6  7/5/2013 9:35,A  1
7  7/5/2013 9:35,B  2
8  7/5/2013 9:36,D  1
9  7/5/2013 9:39,A  1
10 7/5/2013 9:40,B  1
11 8/5/2013 9:35,A  2
12 8/5/2013 9:35,B  1
13 8/5/2013 9:36,B  1
14 8/5/2013 9:39,A  1
15 8/5/2013 9:40,B  1
16 9/5/2013 9:35,B  3
17 9/5/2013 9:36,D  1
18 9/5/2013 9:39,A  1
19 9/5/2013 9:40,A  1

My Question: How do I achieve the same behavior if I want to get the counts by day or month? I want to use lubridate to get day or month, but, after that, I do not know how to use that to group and subsequently to get the counts.

Something like k <- ddply(data, "day(data$EventType)", function(x) count = nrow(x))

Once I have it, I can believe I can plot them nicely. Your inputs are very much appreciated.

Thanks.

4

1 回答 1

5

有几种方法可以做到这一点。最主要的是确保您正在使用Date/Time类。有一种方法可以将POSIX时间舍入到一天,然后您可以使用一些聚合函数来计算每天的事件:

#  Make sure your character strings represent date and times and then round to days
df[,1]<- as.POSIXct(df[,1],format="%d/%m/%Y %H:%M")
df$Day <- as.character( round(df[,1] , "day" ) )

ddply按照您最初的意图使用...

require(plyr)
ddply( df , .(Day) , summarise , Count = length(EventType) )
Day Count
1 2013-05-06     6
2 2013-05-07     6
3 2013-05-08     6
4 2013-05-09     6

一个base解决办法……

aggregate( df , by = list(df$Day) , length )
     Group.1 DateTime EventType Day
1 2013-05-06        6         6   6
2 2013-05-07        6         6   6
3 2013-05-08        6         6   6
4 2013-05-09        6         6   6
于 2013-06-18T21:48:12.313 回答