4

我需要生成较低的三角矩阵索引(行和列对)。当前的实现效率低下(内存方面),特别是当对称矩阵变大(超过 50K 行)时。有没有更好的办法?

rows <- 2e+01
id <- which(lower.tri(matrix(, rows, rows)) == TRUE, arr.ind=T)
head(id)

#      row col
# [1,]   2   1
# [2,]   3   1
# [3,]   4   1
# [4,]   5   1
# [5,]   6   1
# [6,]   7   1
4

2 回答 2

6

这是另一种方法:

z <- sequence(rows)
cbind(
  row = unlist(lapply(2:rows, function(x) x:rows), use.names = FALSE),
  col = rep(z[-length(z)], times = rev(tail(z, -1))-1))

具有更大数据的基准:

library(microbenchmark)

rows <- 1000
m <- matrix(, rows, rows)

## Your current approach
fun1 <- function() which(lower.tri(m) == TRUE, arr.ind=TRUE)

## An improvement of your current approach
fun2 <- function() which(lower.tri(m), arr.ind = TRUE)

## The approach shared in this answer
fun3 <- function() {
  z <- sequence(rows)
  cbind(
    row = unlist(lapply(2:rows, function(x) x:rows), use.names = FALSE),
    col = rep(z[-length(z)], times = rev(tail(z, -1))-1))
}

## Sven's answer
fun4 <- function() {
  row <- rev(abs(sequence(seq.int(rows - 1)) - rows) + 1)
  col <- rep.int(seq.int(rows - 1), rev(seq.int(rows - 1)))
  cbind(row, col)
}

microbenchmark(fun1(), fun2(), fun3(), fun4())
# Unit: milliseconds
#    expr       min        lq   median       uq       max neval
#  fun1() 77.813577 85.343356 90.60689 95.71648 130.40059   100
#  fun2() 73.812204 82.103600 85.87555 90.59235 138.66547   100
#  fun3()  9.016237  9.382506 10.63291 13.20085  55.42137   100
#  fun4() 20.591863 24.999702 28.82232 31.90663  65.05169   100
于 2014-01-03T07:54:38.323 回答
2

您的方法太慢了,因为必须创建多个矩阵。您使用创建第一个矩阵matrix。该函数在lower.tri内部创建 3 个矩阵。结果与 的比较TRUE创建了第五个矩阵。顺便:TRUE不需要与的比较。

以下方法不会创建任何矩阵,而是计算索引:

rows <- 2e+01 # number of rows and columns (20)

x <- rev(abs(sequence(seq.int(rows - 1)) - rows) + 1)
y <- rep.int(seq.int(rows - 1), rev(seq.int(rows - 1)))

idx <- cbind(x, y)

(如果你想要一个稍微快一点的方法,你可以将结果分配seq.int(rows - 1)给一个变量,而不是使用这个命令三次。)

与原始解决方案比较:

id <- which(lower.tri(matrix(, rows, rows)) == TRUE, arr.ind=T)

all(id == idx)
# TRUE
于 2014-01-03T07:40:48.757 回答