0

我需要Hmisc::rcorr()用大内存data.table执行(各种前面的函数和子集都需要这种格式)。可以在给定的 RAM 中读取和加载 ( fread()) 对象,但rcorr()操作最终会失败:

> Error: cannot allocate vector of size 1.2 Gb

因此,我已经转移到各种包来处理大内存数据。我最初尝试过bigmemory,但无法开始:

# data_DT is a data.table of ~18000 columns and ~800 rows

# rcorr requires a matrix as input
rcorr(
  x = as.big.matrix(as.matrix(data_DT)),
  type = "pearson"
)

#> Warning in is.na(x) :
#>   is.na() applied to non-(list or vector) of type 'S4'
#> Error in SetIndivVectorElements.bm(x, i, value) : 
#>   Logical indices not allowed when subsetting by a matrix.

然后我搬到disk.frame了没有成功:

# Initialise parallel processing backend via future
setup_disk.frame()

# Enable large datasets to be transferred between sessions
options(future.globals.maxSize = Inf)


# Create filebacked disk.frame
test_df <- as.disk.frame(
  data_DT, 
  outdir = file.path(tempdir(), "test_tmp.df"),
  nchunks = recommend_nchunks(data_DT, conservatism = 4),
  overwrite = TRUE
)


# Correlation test
test_rcorr <- test_df %>%
  as.matrix %>%                  # It appears to fail at this point
  rcorr(type = "pearson") %>%
  collect_list

#> Error in dimnames(data) <- dimnames : 
#> length of 'dimnames' [1] not equal to array extent


dimnames(test_DT_df)
#> NULL

我还尝试disk.frame在将 转换data.table为矩阵后创建一个,但它不接受data.frame类对象以外的任何内容。

我感谢提供的任何帮助。

4

0 回答 0