1

在计算 pearsons 相关性时,下面的脚本使用相同的数据对我有用。我最近对其进行了调整,以创建一个协方差矩阵以输入到 pca 中。我在一个论坛上读到,输入预先创建的协方差矩阵可能会避免内存问题,但对我来说并非如此。运行协方差矩阵时出现以下错误:

Error: cannot allocate vector of size 1.1 Gb
In addition: Warning messages:
1: In na.omit.default(cbind(x, y)) :
  Reached total allocation of 6141Mb: see help(memory.size)
2: In na.omit.default(cbind(x, y)) :
  Reached total allocation of 6141Mb: see help(memory.size)
3: In na.omit.default(cbind(x, y)) :
  Reached total allocation of 6141Mb: see help(memory.size)
4: In na.omit.default(cbind(x, y)) :
  Reached total allocation of 6141Mb: see help(memory.size)

谁能建议一种更有效的方法来做到这一点,这样我就不会遇到内存问题?如果我在这里完全偏离基础首先计算协方差,那很好。PCA 是我最终唯一需要的东西。我的数据是 arcGIS 栅格格式的 12 个 1 波段栅格,每个数据大小为 581.15 mb。非常感激任何的帮助。

library(rgdal)
library(raster)


setwd("K:/Documents/SDSU/Thesis/GIS Data All/GIS Layers/Generated_Layers/GridsForCor")


# List the full path to each raster:
raster_files = c('aspectclp',
                 'lakedistclp',
                 'ocdistclp',
                 'popdenclp',
                 'roaddistclp',
                 'scurveclp',
                 'sdemclp',
                 'solarradclp',
                 'sslopeclp',
                 'vegcatclp',
                 'canopcvrclp',
                 'canophtclp')

cov_matrix <- matrix(NA, length(raster_files), length(raster_files))
for (outer_n in 1:length(raster_files)) {
  outer_raster <- raster(raster_files[outer_n])
  # Start this loop at outer_n rather than 1 so that we don't compute the 
  # same covariance twice. At the end of the loops cov_matrix will be upper 
  # triangular, with the lower triangle all NA, and the diagonal all NA 
  # (since the diagonal would all be 1 anyway).
  for (inner_n in (outer_n):length(raster_files)) {
    # Don't compute correlation of a raster with itself:
    if (inner_n == outer_n) {next}
    inner_raster <- raster(raster_files[inner_n])
    cov_matrix[outer_n, inner_n] <- cov(outer_raster[], inner_raster[], 
                                    use='complete.obs', method = "spearman")
  }
}

pca_matrix <- princomp(raster_files, cor = FALSE, covmat = cov_matrix))

# Writing to a txt file & csv file
write.table(pca_matrix, "PCA.txt", sep="\t", row.names = FALSE)
write.csv(pca_matrix, "PCA.csv") enter code here
4

1 回答 1

1

我在 ffdf 对象上执行 pca 时遇到了类似的困难。gc()尝试像这样在你的(内部)循环中插入一个:

for (inner_n in (outer_n):length(raster_files)) {
  # Don't compute correlation of a raster with itself:
  if (inner_n == outer_n) {next}
  inner_raster <- raster(raster_files[inner_n])
  cov_matrix[outer_n, inner_n] <- cov(outer_raster[], inner_raster[], 
                                use='complete.obs', method = "spearman")
  gc()
}

这会强制立即进行垃圾收集,这可以释放足够的内存以供for循环继续进行 - 至少对我来说这足以让它工作。

于 2014-06-25T09:53:49.010 回答