1

I'm trying to figure out the five fastest-growing topics for each quarter. I have a dataframe (will call it df) in R with three columns - a quarter number (df$QNum), the topic (df$Topic) and the number of records of that topic that quarter (df$Total_Hits).

Here's an example of how my dataframe df looks:

    Total_Hits          Topic                 QNum
        10              Technology            1
        86              Video Conferencing    1
        14              Video Conferencing    2
        10              Technology            3
         1              Video Conferencing    1
        12              Technology            21

I want to create a new column in df, df$QonQGrowth that, for each record, calculates the growth of Hits on that Topic over the previous quarter. I don't mind how it looks for df$QNum=1 but for the third record in this example, it would calculate: (Total_Hits/(Total_Hits, where Topic="Video Conferencing" and QNum=1)-1)

I think it would look something like the following but can't quite figure it out:

df$QonQGrowth <- (df$Total_Hits / ([a lookup of Total_Hits for df$Topic and (df$Qnum-1)?]))-1

The data set is pretty large so it's possible that there won't be a record for every topic every quarter.

Similar questions here and here but they're not doing exactly what I need.

Edit: This question also seems like it might be useful, using ddply or aggregate.

Thanks so much in advance!

4

1 回答 1

4

我确信有一种更清洁的方法可以做到这一点,但这是一个丑陋的plyr解决方案:

> #make junk data
> set.seed(3)
> dat <- data.frame(total_hits=round(runif(25,0,50)),topic=sample(c("A","B","C","D"),25,T),qnum=round(runif(25,1,6)))
> dat <- dat[row.names(unique(dat[,2:3])),]
> dat
   total_hits topic qnum
1           8     D    2
2          40     C    2
3          19     D    5
4          16     C    6
6          30     B    6
7           6     B    3
8          15     A    2
9          29     B    2
11         26     B    5
12         25     D    1
13         27     A    5
18         35     A    4
19         45     C    3
20         14     B    1
23          6     A    6
25         12     D    4
> 
> #get you qonqgrowth variable
> library(plyr)
> ddply(dat,.(topic,qnum),summarize,qonqgrowth=ifelse(any(dat$qnum==qnum-1 & dat$topic == topic),total_hits/(dat$total_hits[dat$qnum == qnum-1 & dat$topic == topic]),NA))
   topic qnum qonqgrowth
1      A    2         NA
2      A    4         NA
3      A    5  0.7714286
4      A    6  0.2222222
5      B    1         NA
6      B    2  2.0714286
7      B    3  0.2068966
8      B    5         NA
9      B    6  1.1538462
10     C    2         NA
11     C    3  1.1250000
12     C    6         NA
13     D    1         NA
14     D    2  0.3200000
15     D    4         NA
16     D    5  1.5833333
于 2013-09-06T04:04:08.870 回答