我需要Hmisc::rcorr()
用大内存data.table
执行(各种前面的函数和子集都需要这种格式)。可以在给定的 RAM 中读取和加载 ( fread()
) 对象,但rcorr()
操作最终会失败:
> Error: cannot allocate vector of size 1.2 Gb
因此,我已经转移到各种包来处理大内存数据。我最初尝试过bigmemory
,但无法开始:
# data_DT is a data.table of ~18000 columns and ~800 rows
# rcorr requires a matrix as input
rcorr(
x = as.big.matrix(as.matrix(data_DT)),
type = "pearson"
)
#> Warning in is.na(x) :
#> is.na() applied to non-(list or vector) of type 'S4'
#> Error in SetIndivVectorElements.bm(x, i, value) :
#> Logical indices not allowed when subsetting by a matrix.
然后我搬到disk.frame
了没有成功:
# Initialise parallel processing backend via future
setup_disk.frame()
# Enable large datasets to be transferred between sessions
options(future.globals.maxSize = Inf)
# Create filebacked disk.frame
test_df <- as.disk.frame(
data_DT,
outdir = file.path(tempdir(), "test_tmp.df"),
nchunks = recommend_nchunks(data_DT, conservatism = 4),
overwrite = TRUE
)
# Correlation test
test_rcorr <- test_df %>%
as.matrix %>% # It appears to fail at this point
rcorr(type = "pearson") %>%
collect_list
#> Error in dimnames(data) <- dimnames :
#> length of 'dimnames' [1] not equal to array extent
dimnames(test_DT_df)
#> NULL
我还尝试disk.frame
在将 转换data.table
为矩阵后创建一个,但它不接受data.frame
类对象以外的任何内容。
我感谢提供的任何帮助。