r - 从矩阵中提取补丁/写入时复制

Question

我有一个相当大的（1040x1392）双精度矩阵，我想提取另一个矩阵，其列是第一个矩阵的 16x16 块。（我知道，这是很多数据，使用它可能不实用，但这应该可以......）

我尝试使用此代码，其中“数据”是原始矩阵：

# Create a matrix of starting coordinates for each patch
patch.size = 16
patch.inc = patch.size - 1
coords = expand.grid(x=1:(ncol(data)-patch.inc), y=1:(nrow(data)-patch.inc))
coords = as.matrix(coords)

# Pre-allocate the destination matrix
patches = double(nrow(coords)*patch.size^2)
dim(patches) = c(patch.size^2, nrow(coords))

#Create overlapping patches
for (i in 1:nrow(coords))
{
  x=coords[i,1]
  y=coords[i,2]
  patches[,i] = as.vector(data[y:(y+patch.inc), x:(x+patch.inc)])
}

这在具有 8GB RAM 的相当快的 Win7-64 机器上运行得非常慢；即使只创建 100 个补丁也很慢。

事实证明，分配给 patch[,i] 是问题所在。查看任务管理器，当我分配给 patch[,i] 时，内存使用量出现了巨大的峰值。

我有一些问题。首先，发生了什么？看起来整个补丁矩阵正在被复制到每个分配中。是对的吗？如果是这样，为什么？我认为预先分配补丁矩阵可以避免这种情况。其次，有没有更好的方法来编写这段代码，这样它就可以在我的有生之年完成:-)？

谢谢！肯特

score 1 · Accepted Answer

对于第二个问题，这是一个使用lapply.

out如果您想要精确的输出作为脚本，您可以转置结果。我检查了较小的尺寸并验证它等于您的输出patches。

set.seed(1234)
nr <- 1040
nc <- 1392
data <- matrix(rnorm(nr*nc), nrow = nr)
patch.size <- 16
idx <- expand.grid(1:(ncol(data)-patch.size+1), 1:(nrow(data)-patch.size+1))
idx[,3] <- idx[,1]+patch.size-1
idx[,4] <- idx[,2]+patch.size-1
idx <- as.matrix(idx)

# using rbenchmark
require(rbenchmark)
myFun <- function() {
    out <- do.call(rbind, lapply(1:nrow(idx), 
        function(tx) c(data[idx[tx,2]:idx[tx,4], idx[tx,1]:idx[tx,3]])))
}
benchmark(myFun(), replications = 2)

# Result:
     test replications elapsed relative user.self sys.self user.child sys.child
1 myFun()            2 152.146        1   147.957    4.184          0         0

# using system.time
system.time(out <- do.call(rbind, lapply(1:nrow(idx), 
        function(tx) c(data[idx[tx,2]:idx[tx,4], idx[tx,1]:idx[tx,3]]))))        

# Result
  user  system elapsed 
58.852   1.784  60.638

r - 从矩阵中提取补丁/写入时复制

1 回答 1

Related

Reference