r - 查找具有更高索引的第一个更大元素

Question

我有两个向量，A和B。对于中的每个元素，A我想找到其中第一个元素的索引B更大并且具有更高的索引。A和的长度B相同。

所以对于向量：

A <- c(10, 5, 3, 4, 7)

B <- c(4, 8, 11, 1, 5)

我想要一个结果向量：

R <- c(3, 3, 5, 5, NA)

当然我可以用两个循环来做，但是它很慢，而且我不知道在这种情况下如何使用 apply() ，当索引很重要时。我的数据集的向量长度为 20000，因此在这种情况下速度非常重要。

几个额外的问题：

如果我有一个数字序列（如seq = 2:10），并且我想为 A 的每个 a 和 seq 的每个 s 找到其中第一个B高于 a+s的数字怎么办？
与问题 1) 一样，但我想知道第一个较大的值和第一个较低的值，并创建一个矩阵，该矩阵存储哪个是第一个。例如，我有一个 ofA和10 来自 seq。我想找到B高于a+10或低于a-10的第一个值，然后存储它的索引和值。

score 6 · Accepted Answer

6

sapply(sapply(seq_along(a),function(x) which(b[-seq(x)]>a[x])+x),"[",1)
[1]  3  3  5  5 NA

于 2012-12-01T09:55:15.653 回答

score 6 · Accepted Answer

这是 sapply 效率低于循环的一个很好的例子。尽管 sapply 确实使代码看起来更整洁，但随着时间的推移，你正在为这种整洁付出代价。

相反，您可以将 while 循环包裹在 for 循环中，并包含在一个漂亮、整洁的函数中。

以下是比较嵌套应用循环与嵌套 for-while 循环（以及混合应用-while 循环，为了更好的衡量标准）的基准。更新：添加了vapply..match..评论中提到的内容。比 sapply 快，但仍然比 while 循环慢得多。

基准：

           test elapsed relative
1     for.while   0.069    1.000
2  sapply.while   0.080    1.159
3  vapply.match   0.101    1.464
4 nested.sapply   0.104    1.507

请注意，您节省了三分之一的时间；当您开始将序列添加到 A 时，节省的费用可能会更大。

对于您问题的第二部分：

如果您将所有这些都包含在一个不错的函数中，则很容易将 seq 添加到 A

# Sample data
A <- c(10, 5, 3, 4, 7, 100, 2)
B <- c(4, 8, 11, 1, 5, 18, 20)

# Sample sequence
S <- seq(1, 12, 3)

# marix with all index values (with names cleaned up)   
indexesOfB <- t(sapply(S, function(s) findIndx(A+s, B)))
dimnames(indexesOfB) <- list(S, A)

最后，如果您想找到 B小于A 的值，只需交换函数中的操作即可。
（您可以在函数中包含一个 if 子句并仅使用一个函数。我发现拥有两个单独的函数更有效）

findIndx.gt(A, B)   #  [1]  3  3  5  5  6 NA  8 NA NA
findIndx.lt(A, B)   #  [1]  2  4  4 NA  8  7 NA NA NA

然后你可以把它包在一个漂亮的包装里

rangeFindIndx(A, B, S)
 #     A   S  indxB.gt indxB.lt
 #    10   1        3        2
 #     5   1        3        4
 #     3   1        5        4
 #     4   1        5       NA
 #     7   1        6       NA
 #   100   1       NA       NA
 #     2   1       NA       NA
 #    10   4        6        4
 #     5   4        3        4
 #   ...

职能

（注意它们依赖于reshape2）

rangeFindIndx <- function(A, B, S) {
  # For each s in S, and for each a in A,
  # find the first value of B, which is higher than a+s, or lower than a-s

  require(reshape2)

  # Create gt & lt matricies;  add dimnames for melting function
  indexesOfB.gt <- sapply(S, function(s) findIndx.gt(A+s, B))
  indexesOfB.lt <- sapply(S, function(s) findIndx.lt(A-s, B))
  dimnames(indexesOfB.gt) <- dimnames(indexesOfB.gt) <- list(A, S)

  # melt the matricies and combine into one
  gtltMatrix <- cbind(melt(indexesOfB.gt), melt(indexesOfB.lt)$value)

  # clean up their names
  names(gtltMatrix) <- c("A", "S", "indxB.gt", "indxB.lt")

  return(gtltMatrix)
}

findIndx.gt <- function(A, B) {
  lng <- length(A)
  ret <- integer(0)
  b <- NULL
  for (j in seq(lng-1)) {
    i <- j + 1
    while (i <= lng && ((b <- B[[i]]) < A[[j]]) ) {
      i <- i + 1
    }
    ret <- c(ret, ifelse(i<lng, i, NA))
  }
  c(ret, NA)  
}

findIndx.lt <- function(A, B) {
  lng <- length(A)
  ret <- integer(0)
  b <- NULL
  for (j in seq(lng-1)) {
    i <- j + 1
    while (i <= lng && ((b <- B[[i]]) > A[[j]]) ) {   # this line contains the only difference from findIndx.gt
      i <- i + 1
    }
    ret <- c(ret, ifelse(i<lng, i, NA))
  }
  c(ret, NA)  
}

r - 查找具有更高索引的第一个更大元素

2 回答 2

基准：

对于您问题的第二部分：

职能

Related

Reference