2

我的数据框有一个问题,其中个人 ID 相同,但在超市、健康、汽车等各种类别的费用不同。我的数据框是这样的:

Base=data.frame(ID=c("CED1","CED2","CED3","CED1","CED1","CED3","CED3","CED2","CED2","CED4"),Value=c(10,20,10,30,50,10,10,20,30,30),Categorie=c("Markets","Markets","Health","Cars","Cars","Health","Cars","Health","Cars","Markets"))

    ID   Value Categorie
1  CED1    10   Markets
2  CED2    20   Markets
3  CED3    10    Health
4  CED1    30      Cars
5  CED1    50      Cars
6  CED3    10    Health
7  CED3    10      Cars
8  CED2    20    Health
9  CED2    30      Cars
10 CED4    30   Markets

你怎么能看到我有不同的ID和类别。我想要这个数据框中的指标在一个新的中,并且有这样的人:

ID   Total.Value   Max.Value  Min.Value  Average.Value  %Markets  %Health  %Cars
CED1     90           50         10           30           11%       0%      89%
CED2     70           30         20           23.33        28.5%    28.5%   42.8%
CED3     30           10         10           10           33.3%    33.3%   33.3%
CED4     30           30         30           30           100%      0%       0%

我正在尝试使用 plyr 开发这个数据框,但我没有得到正确的指标。谢谢你的帮助。

4

2 回答 2

3

这是一个ddply解决方案。

library(plyr)
ddply(Base, .(ID), summarise, Total = sum(Value),
      Max.Value = max(Value),
      Min.Value = min(Value),
      Average.Value = mean(Value),
      "%Markets" = sum(Value[Categorie == "Markets"])/sum(Value) * 100,
      "%Health" = sum(Value[Categorie == "Health"])/sum(Value) * 100,
      "%Cars" = sum(Value[Categorie == "Cars"])/sum(Value) * 100)

结果:

    ID Total Max.Value Min.Value Average.Value  %Markets  %Health    %Cars
1 CED1    90        50        10      30.00000  11.11111  0.00000 88.88889
2 CED2    70        30        20      23.33333  28.57143 28.57143 42.85714
3 CED3    30        10        10      10.00000   0.00000 66.66667 33.33333
4 CED4    30        30        30      30.00000 100.00000  0.00000  0.00000
于 2013-03-18T15:50:52.433 回答
1

这是一个data.table解决方案:

require(data.table)
dt <- data.table(Base, key="ID")
dt[, as.list(c(total=sum(Value), max=max(Value), 
    min=min(Value), mean=mean(Value), 
    tapply(Value, Categorie, sum)/sum(Value) * 100)), 
by=ID]
#      ID total max min     mean     Cars   Health   Markets
# 1: CED1    90  50  10 30.00000 88.88889       NA  11.11111
# 2: CED2    70  30  20 23.33333 42.85714 28.57143  28.57143
# 3: CED3    30  10  10 10.00000 33.33333 66.66667        NA
# 4: CED4    30  30  30 30.00000       NA       NA 100.00000

这里可以将NA替换为0。如果坚持0直接get而不是NA,那么:

dt[, {tt <- tapply(Value, Categorie, sum)/sum(Value); ## compute ratio for percentage
      tt[is.na(tt)] <- 0; 
      as.list(c(total=sum(Value),                     ## total
          summary(Value)[c(6,1,4)],                   ## max, min and mean
          tt* 100))                                   ## percentages
     }, 
by=ID]

#      ID total Max. Min.  Mean     Cars   Health   Markets
# 1: CED1    90   50   10 30.00 88.88889  0.00000  11.11111
# 2: CED2    70   30   20 23.33 42.85714 28.57143  28.57143
# 3: CED3    30   10   10 10.00 33.33333 66.66667   0.00000
# 4: CED4    30   30   30 30.00  0.00000  0.00000 100.00000

在这里,我还展示了如何使用summary函数来获取一些值,而不是一一写入。

于 2013-03-18T15:58:21.157 回答