5

I specifically started to think in this problem trying to get the values form a vector that were not repeated. unique is not good (up to what I could collect from the documentation) because it gives you repeated elements, but only once. duplicated has the same problem since it gives you FALSE the first time it finds a value that is duplicated. This was my workaround

> d=c(1,2,4,3,4,6,7,8,5,10,3)
> setdiff(d,unique(d[duplicated(d)]))
[1]  1  2  6  7  8  5 10

The following is a more general approach

> table(d)->g
> as.numeric(names(g[g==1]))
[1]  1  2  5  6  7  8 10

which we can generalize to other value than 1. But I find this solution a bit clumsy, transforming strings to numbers. Is there a better or more straightforward way to get this vector?

4

5 回答 5

4

您可以对值进行排序,然后用于rle获取连续出现n次的值。

rl <- rle(sort(d))

rl$values[rl$lengths==1]
## [1]  1  2  5  6  7  8 10

rl$values[rl$lengths==2]
## [1] 3 4
于 2014-09-30T14:56:13.217 回答
3

你也可以在基础 R 中做这样的事情。

as.numeric(levels(factor(d))[tabulate(factor(d)) == 1])
# [1]  1  2  5  6  7  8 10

我已经使用factorandlevels使该方法更通用(因此“d”可以包括负值和 0)。


当然,即使是这样的事情,您也可以期待“data.table”的性能提升,您可以使用它执行以下操作:

library(data.table)
as.data.table(d)[, .N, by = d][N == 1]$d
# [1]  1  2  6  7  8  5 10
于 2014-09-30T15:28:13.850 回答
2

这里的单衬是完全没有必要的,但单衬总是不错的

假设您想找到所有发生 2 次的元素。使用plyr包:

count(d)$x[count(d)$freq==2]
#Output
#[1] 3 4
于 2014-09-30T15:08:30.307 回答
1

您可以使用duplicatedn=1,只需调用它两次并使用fromLast参数。

sort(d[! (duplicated(d) | duplicated(d, fromLast=TRUE))])
# [1]  1  2  5  6  7  8 10
于 2014-09-30T15:06:48.233 回答
1

我更喜欢其他答案,但这似乎是一个很好的借口来测试我的技能dplyr

library(dplyr)
as.data.frame(table(d)) %>%
  filter(Freq == 1) %>%
  select(d)
---
   d
1  1
2  2
3  5
4  6
5  7
6  8
7 10
于 2014-09-30T15:10:48.277 回答