r - 按重复量对数据帧进行子集

Question

如果我有这样的数据框：

neu <- data.frame(test1 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14), 
                  test2 = c("a","b","a","b","c","c","a","c","c","d","d","f","f","f"))
neu
   test1 test2
1      1     a
2      2     b
3      3     a
4      4     b
5      5     c
6      6     c
7      7     a
8      8     c
9      9     c
10    10     d
11    11     d
12    12     f
13    13     f
14    14     f

我只想选择那些因子水平test2出现超过三倍的值，最快的方法是什么？

非常感谢，在前面的问题中并没有真正找到正确的答案。

score 7 · Accepted Answer

使用以下命令查找行：

z <- table(neu$test2)[table(neu$test2) >= 3] # repeats greater than or equal to 3 times

或者：

z <- names(which(table(neu$test2)>=3))

然后子集：

subset(neu, test2 %in% names(z))

或者：

neu[neu$test2 %in% names(z),]

score 5 · Accepted Answer

这是另一种方式：

 with(neu, neu[ave(seq(test2), test2, FUN=length) > 3, ])

#   test1 test2
# 5     5     c
# 6     6     c
# 8     8     c
# 9     9     c

score 3 · Accepted Answer

我会count从plyr包中使用来执行计数：

library(plyr)
count_result = count(neu, "test2")
matching = with(count_result, test2[freq > 3])
with(neu, test1[test2 %in% matching])
[1] 5 6 8 9

score 2 · Accepted Answer

（更好的缩放）data.table方式：

library(data.table)
dt = data.table(neu)

dt[dt[, .I[.N >= 3], by = test2]$V1]

注意：希望将来，以下更简单的语法将是执行此操作的快速方法：

dt[, .SD[.N >= 3], by = test2]

（cf使用 data.table 分组的子集）

r - 按重复量对数据帧进行子集

4 回答 4

Related

Reference