1

如果我有一个df:

> ID<-c("A","A","A","B","B","B","B","C","C","C","C")
> attr<-c("yes1","yes1","no","yes2","yes1","yes1","yes1","no","no","yes1","yes2")
> df = data.frame(ID, attr) ; df
   ID attr
1   A yes1
2   A yes1
3   A   no
4   B yes2
5   B yes1
6   B yes1
7   B yes1
8   C   no
9   C   no
10  C yes1
11  C yes2

拥有数千个 ID。我想添加另一列输出"yes"每个 ID 的属性百分比,以及是否只有一个"no"属性:

     ID    %yes   #no
1     A    66.7     1
2     B     100     0
3     C      50     2

有没有办法整合行,类似于 SQL GROUP BY?最终,这个新的 df 将对 ID 进行分类并添加到原始 df 中:

     ID    attr    result
1     A    yes1       Pos
2     A    yes1       Pos
3     A      no     False
4     B    yes2   TruePos
5     B    yes1   TruePos
6     B    yes1   TruePos
7     B    yes1   TruePos
8     C      no     False
9     C      no     False
10    C    yes1       Pos
11    C    yes2       Pos
4

2 回答 2

3

看一下data.table包装:

加载包并将您的转换data.framedata.table. 用于key=指定您的分组列。

library(data.table)
DT <- data.table(df, key="ID")

执行您的聚合。

DT2 <- DT[, list(pct = length(grep("yes", attr))/length(attr),
                 no = sum(attr == "no")), by=key(DT)]
DT2
#    ID       pct no
# 1:  A 0.6666667  1
# 2:  B 1.0000000  0
# 3:  C 0.5000000  2
于 2012-11-29T18:36:35.943 回答
2

这将为您提供每个 ID 级别的“是”比例:

by(substr(df$attr,1,3)=="yes",INDICES=df$ID,FUN=mean)

这将告诉您每个 ID 级别的“否”条目数:

by(df$attr=="no",INDICES=df$ID,FUN=sum)
于 2012-11-29T18:27:59.677 回答