2

R 中是否支持大型稀疏矩阵?我目前正在处理一个大约0.001密度的 1.9M 稀疏方阵。

我想在我的具有 480gb 内存的 AWS 现场实例上对 R 中这个矩阵的创建进行压力测试。

library(Matrix)

DIMS = as.numeric(1988463)
DENSITY = as.numeric(0.001)
VALS = as.numeric(DIMS*DIMS*DENSITY)

i <- sample(DIMS, VALS, replace = TRUE)    
j <- sample(DIMS, VALS, replace = TRUE)    
x <- rpois(VALS, 10)

sp_matrix <- sparseMatrix(i = i, 
                          j = j, 
                          x = as.numeric(x), 
                          dims=list(DIMS, DIMS))

但是,我收到此错误。

Error in validityMethod(as(object, superClass)): long vectors not supported yet: ../../src/include/Rinlinedfuns.h:522
Traceback:

1. system.time(sp_matrix <- sparseMatrix(i = i, j = j, x = as.numeric(x), 
 .     dims = list(DIMS, DIMS)))
2. sparseMatrix(i = i, j = j, x = as.numeric(x), dims = list(DIMS, 
 .     DIMS))
3. validObject(r)
4. anyStrings(validityMethod(as(object, superClass)))
5. isTRUE(x)
6. validityMethod(as(object, superClass))
Timing stopped at: 76.42 73.41 151

是否有针对此问题的任何软件包或解决方法?最后,我将使用该reticulate包来加载一个稀疏csr矩阵numpy,以便利用更快和内存效率更高的text2vec包来运行手套,这需要数据dgCMatrix格式。

编辑

我还尝试spam使用以下代码行来模拟一个大而稀疏的矩阵。

library(spam)
test_matrix <- spam_random(nrow = 1900000, ncol = 1900000, density = 0.001)

它将运行以下警告:

Warning message in spam_random(nrow = 1900000, ncol = 1900000, density = 0.001):
"integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'"

直到它最终超时并显示以下错误消息:

Error in if (rowp[i] == rowp[i + 1L]) next: missing value where TRUE/FALSE needed
Traceback:

1. system.time(test_matrix <- spam_random(nrow = 1900000, ncol = 1900000, 
 .     density = 0.001))
2. spam_random(nrow = 1900000, ncol = 1900000, density = 0.001)
Timing stopped at: 1657 228.3 1903
4

0 回答 0