1

我是 R 新手,在矢量化特别慢的嵌套循环时遇到了麻烦。循环遍历中心列表(存储在结构中的向量)并找到这些向量与下面称为数组的行之间的距离x。我知道这需要矢量化以提高速度,但无法找出适当的功能或使用apply来做到这一点。

clusterCenters <- matrix(runif(10000),nrow=100)
clusterMembers <- matrix(runif(400000),nrow=4000)

features <- matrix(0,(dim(clusterMembers)[1]),(dim(clusterCenters)[1]))

for(c in 1:dim(clusterCenters)[1]){
  center <- clusterCenters[c,]
  for(v in 1:(dim(clusterMembers)[1])){
    vector <- clusterMembers[v,]
    features[v,c] <- sqrt(sum((center - vector)^2))
  }
}

谢谢你的帮助。

4

1 回答 1

2

您可以利用 R 的回收规则来加快速度。但是您必须知道并考虑到 R 以列优先顺序存储矩阵的事实。你通过转置来做到这一点,clusterMembers然后center向量将沿着t(clusterMembers).

set.seed(21)
clusterCenters <- matrix(runif(10000),nrow=100)
clusterMembers <- matrix(runif(400000),nrow=4000)
# your original code in function form
seven <- function() {
  features <- matrix(0,(dim(clusterMembers)[1]),(dim(clusterCenters)[1]))
  for(c in 1:dim(clusterCenters)[1]){
    center <- clusterCenters[c,]
    for(v in 1:(dim(clusterMembers)[1])){
      vector <- clusterMembers[v,]
      features[v,c] <- sqrt(sum((center - vector)^2))
    }
  }
  features
}
# my fancy function
josh <- function() {
  tcm <- t(clusterMembers)
  Features <- matrix(0,ncol(tcm),nrow(clusterCenters))
  for(i in 1:nrow(clusterCenters)) {
    # clusterCenters[i,] returns a vector because drop=TRUE by default
    Features[,i] <- colSums((clusterCenters[i,]-tcm)^2)
  }
  Features <- sqrt(Features)  # outside the loop to avoid function calls
}
system.time(seven())
#    user  system elapsed 
#     2.7     0.0     2.7 
system.time(josh())
#    user  system elapsed 
#    0.28    0.11    0.39 
identical(seven(),josh())
# [1] TRUE
于 2013-03-04T18:41:09.683 回答