-1

我有一个包含啤酒评论的大型数据框,其中包含重复的评论和重复中的一些差异。

> head( beer_data ) 

  brewery_id            brewery_name review_time review_overall
1      10325         Vecchio Birraio  1234817823            1.5
2      10325         Vecchio Birraio  1235915097            3.0
3      10325         Vecchio Birraio  1235916604            3.0
4      10325         Vecchio Birraio  1234725145            3.0
5       1075 Caldera Brewing Company  1293735206            4.0
6       1075 Caldera Brewing Company  1325524659            3.0
  review_aroma review_appearance review_profilename
1          2.0               2.5            stcules
2          2.5               3.0            stcules
3          2.5               3.0            stcules
4          3.0               3.5            stcules
5          4.5               4.0     johnmichaelsen
6          3.5               3.5            oline73
                      beer_style review_palate review_taste
1                     Hefeweizen           1.5          1.5
2             English Strong Ale           3.0          3.0
3         Foreign / Export Stout           3.0          3.0
4                German Pilsener           2.5          3.0
5 American Double / Imperial IPA           4.0          4.5
6           Herbed / Spiced Beer           3.0          3.5
               beer_name beer_abv beer_beerid
1           Sausa Weizen      5.0       47986
2               Red Moon      6.2       48213
3 Black Horse Black Beer      6.5       48215
4             Sausa Pils      5.0       47969
5          Cauldron DIPA      7.7       64883
6    Caldera Ginger Beer      4.7       52159
> 

我想使用 ddply 将重复的啤酒评论列汇总到一个新的较小数据框中进行分析,这可以使用 ddply 吗?

4

1 回答 1

0

这样的事情怎么样?

duplicate_data <- ddply(beer_data, .(brewery_id), function(x) {
    if(nrow(x) > 1)
        return(data.frame("brewery_id" = unique(x$brewery_id), "mean_ratings" = mean(x$review_overall)))
        # You can fill in the rest
    })
于 2013-08-19T19:45:14.000 回答