r - 向量化和并行化列表的分解

Question

下面是一些代码，它生成一个data.frames 列表，然后将该原始列表转换为一个新列表，其中每个列表元素都是每个数据帧的行的列表。

例如。
-l1长度为 10，每个元素为data.frame1000 行。
-l2是一个长度为 1000 ( nrow(l1[[k]])) 的列表，每个元素是一个list长度为 10 ( length(l1)) 的列表，其中包含来自元素的行向量l1

l1 <- vector("list", length= 10)
set.seed(65L)
for (i in 1:10) {
  l1[[i]] <- data.frame(matrix(rnorm(10000),ncol=10))
}

l2 <- vector(mode="list", length= nrow(l1[[1]]))
for (i in 1:nrow(l1[[1]])) {
  l2[[i]] <- lapply(l1, function(l) return(unlist(l[i,])))
}

编辑为了澄清l1与的关系l2，这里是与语言无关的代码。

for (j in 1:length(l1) {
  for (i in 1:nrow(l1[[1]]) { # where nrow(l1[[1]]) == nrow(l1[[k]]) k= 2,...,10
    l2[[i]][[j]] <- l1[[j]][i,]
  }
}

如何l2通过矢量化或并行化加速创建 up？我遇到的问题是parallel::parLapplyLB拆分列表；但是，我不想拆分列表l1，我想做的是拆分每个元素中的行l1。*apply一个中间解决方案将通过使用一些函数来替换 for 循环来对我当前的方法进行矢量化。这显然也可以扩展到并行解决方案。

如果我在一个可接受的解决方案之前自己解决这个问题，我会在这里发布我的答案。

score 1 · Accepted Answer

我会完全打破结构并通过split. 这种方法需要比原始方法更多的内存，但至少对于给定的示例，它要快 10 倍以上：

sgibb <- function(x) {
  ## get the lengths of all data.frames (equal to `sapply(x, ncol)`)
  n <- lengths(x)
  ## destroy the list structure
  y <- unlist(x, use.names = FALSE)
  ## generate row indices (stores the information which row the element in y
  ## belongs to)
  rowIndices <- unlist(lapply(n, rep.int, x=1L:nrow(x[[1L]])))
  ## split y first by rows
  ## and subsequently loop over these lists to split by columns
  lapply(split(y, rowIndices), split, f=rep.int(seq_along(n), n))
}

alex <- function(x) {
  l2 <- vector(mode="list", length= nrow(x[[1]]))
  for (i in 1:nrow(x[[1]])) {
    l2[[i]] <- lapply(x, function(l) return(unlist(l[i,])))
  }
  l2
}

## check.attributes is need because the names differ
all.equal(alex(l1), sgibb(l1), check.attributes=FALSE)

library(rbenchmark)
benchmark(alex(l1), sgibb(l1), order = "relative", replications = 10)
#       test replications elapsed relative user.self sys.self user.child sys.child
#2 sgibb(l1)           10   0.808    1.000     0.808        0          0         0
#1  alex(l1)           10  11.970   14.814    11.972        0          0         0

r - 向量化和并行化列表的分解

1 回答 1

Related

Reference