1

我希望在我的数据的子集上运行 ddply 但下面的示例只返回 TRUE 或 FALSE

ddply(demoData, .(name, id, gender == "Male"), summarize, tot = sum(count))

ddply(demoData[demoData$gender == 'Male'], .(name, id, gender), summarize, tot = sum(count))

似乎也不起作用。最终,我需要按名称和 ID 汇总所有性别 =“男性”实例的“计数”。

按要求提供数据样本

id   name    gender     age      count
1    apple    Male      13-20      25
1    apple    Male      21-40      30
1    apple    Female    13-20      60
1    apple    Female    21-40      42
2    banana   Male      13-20      45
2    banana   Male      21-40      12
2    banana   Female    13-20      22
2    banana   Female    21-40      74

我想要返回的是

1    apple    Male   55
2    banana   Male   57
4

2 回答 2

3

Base Raggregate可以非常简单地做到这一点:

aggregate(
          count ~ id + name + gender,
          FUN=sum, 
          subset=gender=="Male",
          data=demoData
         )

结果:

  id   name gender count
1  1  apple   Male    55
2  2 banana   Male    57

如果您绝对必须使用plyr,因为您的生活取决于它或其他原因,那么:

ddply(
   demoData[demoData$gender=="Male",],
   .(id, name, gender),
   summarise, 
   sumcount=sum(count)
  )

给予:

  id   name gender sumcount
1  1  apple   Male       55
2  2 banana   Male       57
于 2013-07-24T23:37:28.143 回答
1

即使ddply没有内置subset参数,

ddply(subset(demoData, gender=="Male"),
    .(name, id), summarize, tot = sum(count))

似乎工作正常...

    name id tot
1  apple  1  55
2 banana  2  57

...尽管结果中没有Male。为此,您需要

ddply(subset(demoData, gender=="Male"),
    .(name, id, gender), summarize, tot = sum(count))
于 2013-07-24T23:44:35.933 回答