r - Getting values that appear exactly n-times

Question

I specifically started to think in this problem trying to get the values form a vector that were not repeated. unique is not good (up to what I could collect from the documentation) because it gives you repeated elements, but only once. duplicated has the same problem since it gives you FALSE the first time it finds a value that is duplicated. This was my workaround

> d=c(1,2,4,3,4,6,7,8,5,10,3)
> setdiff(d,unique(d[duplicated(d)]))
[1]  1  2  6  7  8  5 10

The following is a more general approach

> table(d)->g
> as.numeric(names(g[g==1]))
[1]  1  2  5  6  7  8 10

which we can generalize to other value than 1. But I find this solution a bit clumsy, transforming strings to numbers. Is there a better or more straightforward way to get this vector?

score 4 · Accepted Answer

您可以对值进行排序，然后用于rle获取连续出现n次的值。

rl <- rle(sort(d))

rl$values[rl$lengths==1]
## [1]  1  2  5  6  7  8 10

rl$values[rl$lengths==2]
## [1] 3 4

score 3 · Accepted Answer

你也可以在基础 R 中做这样的事情。

as.numeric(levels(factor(d))[tabulate(factor(d)) == 1])
# [1]  1  2  5  6  7  8 10

我已经使用factorandlevels使该方法更通用（因此“d”可以包括负值和 0）。

当然，即使是这样的事情，您也可以期待“data.table”的性能提升，您可以使用它执行以下操作：

library(data.table)
as.data.table(d)[, .N, by = d][N == 1]$d
# [1]  1  2  6  7  8  5 10

score 2 · Accepted Answer

这里的单衬是完全没有必要的，但单衬总是不错的

假设您想找到所有发生 2 次的元素。使用plyr包：

count(d)$x[count(d)$freq==2]
#Output
#[1] 3 4

score 1 · Accepted Answer

您可以使用duplicatedn=1，只需调用它两次并使用fromLast参数。

sort(d[! (duplicated(d) | duplicated(d, fromLast=TRUE))])
# [1]  1  2  5  6  7  8 10

score 1 · Accepted Answer

我更喜欢其他答案，但这似乎是一个很好的借口来测试我的技能dplyr：

library(dplyr)
as.data.frame(table(d)) %>%
  filter(Freq == 1) %>%
  select(d)
---
   d
1  1
2  2
3  5
4  6
5  7
6  8
7 10

r - Getting values that appear exactly n-times

5 回答 5

Related

Reference