听起来bigmemory
可能有足够的功能来解决您的问题
require(bigmemory)
读取文件
您可以使用big.matrix
with读取文件
read.big.matrix(filename, sep = ",", header = FALSE, col.names = NULL,
row.names = NULL, has.row.names = FALSE, ignore.row.names = FALSE,
type = NA, skip = 0, separated = FALSE, backingfile = NULL,
backingpath = NULL, descriptorfile = NULL, binarydescriptor = FALSE,
extraCols = NULL, shared = TRUE)
节省内存
即使是一个简单的例子iris
,你也可以看到内存节省
x <- as.big.matrix(iris)
options(bigmemory.allow.dimnames=TRUE)
colnames(x) <- c("A", "B", "C", "D", "E")
object.size(x)
# 664 bytes
object.size(iris)
# 7088 bytes
子集
子集big.matrices
是有限的,但提供了一些功能mwhich
子集 if column 1 is <= 5
, ANDcolumn 2 <= 4
x[mwhich(x, 1:2, list(c(5), c(4)), list(c('le'), c('le')), 'AND'),]
# A B C D E
# 2 4.9 3.0 1.4 0.2 1
# 3 4.7 3.2 1.3 0.2 1
# 4 4.6 3.1 1.5 0.2 1
# 5 5.0 3.6 1.4 0.2 1
# etc
注意子集运算的结果是一个正则矩阵。您可以将常规矩阵转换为 big.matrixas.big.matrix()
最小值、最大值、平均值等
biganalytics
提供更多功能big.matrices
require(biganalytics)
colmin(x, cols = 1:2, na.rm = FALSE)
# A B
# 4.3 2.0
colmax(x, cols = 1:2, na.rm = FALSE)
# A B
# 7.9 4.4
输出
最后你可以big.matrix
输出
write.big.matrix(...)