r - 确定向量中第 i 个元素的位置

Question

我有一个向量：a<-rep(sample(1:5,20, replace=T))

我确定每个值的出现频率：

tabulate(a)

我现在想确定最常出现的值的位置。

假设向量是：

[1] 3 3 3 5 2 2 4 1 4 2 5 1 2 1 3 1 3 2 5 1

列表返回：

[1] 5 5 5 2 3

现在我确定制表返回的最高值max(tabulate(a))

这返回

[1] 5

频率为 5 的值有 3 个。我想知道这些值在表格输出中的位置。

即我制表的前三个条目。

score 1 · Accepted Answer

也许使用起来更容易table：

x <- table(a)
x
# a
# 1 2 3 4 5 
# 5 5 5 2 3 
names(x)[x == max(x)]
# [1] "1" "2" "3"
which(a %in% names(x)[x == max(x)])
# [1]  1  2  3  5  6  8 10 12 13 14 15 16 17 18 20

或者，有一个类似的方法tabulate：

x <- tabulate(a)
sort(unique(a))[x == max(x)]

以下是有关数字和字符向量的一些基准。数值数据的性能差异更为明显。

样本数据

set.seed(1)
a <- sample(20, 1000000, replace = TRUE)
b <- sample(letters, 1000000, replace = TRUE)

基准函数

t1 <- function() {
  x <- table(a)
  out1 <- names(x)[x == max(x)]
  out1
}

t2 <- function() {
  x <- tabulate(a)
  out2 <- sort(unique(a))[x == max(x)]
  out2
}

t3 <- function() {
  x <- table(b)
  out3 <- names(x)[x == max(x)]
  out3
}

t4 <- function() {
  x <- tabulate(factor(b))
  out4 <- sort(unique(b))[x == max(x)]
  out4
}

结果

library(rbenchmark)
benchmark(t1(), t2(), t3(), t4(), replications =  50)
#   test replications elapsed relative user.self sys.self user.child sys.child
# 1 t1()           50  30.548   24.244    30.416    0.064          0         0
# 2 t2()           50   1.260    1.000     1.240    0.016          0         0
# 3 t3()           50   8.919    7.079     8.740    0.160          0         0
# 4 t4()           50   5.680    4.508     5.564    0.100          0         0

r - 确定向量中第 i 个元素的位置

1 回答 1

Related

Reference