r - 子集 big.matrix 的正确方法

Question

我想知道是否有一种“正确”的方法来对 R 中的 big.matrix 对象进行子集化。对矩阵进行子集化很简单，但该类总是恢复为“矩阵”。当处理像这样的小数据集但处理大量数据集但处理非常大的数据集时，这不是问题，子集仍然可以从“big.matrix”类中受益。

require(bigmemory)
data(iris)
# I realize the warning about factors but not important for this example
big <- as.big.matrix(iris)

class(big)
[1] "big.matrix"
attr(,"package")
[1] "bigmemory"

class(big[,c("Sepal.Length", "Sepal.Width")])
[1] "matrix"

class(big[,1:2])
[1] "matrix"

score 3 · Accepted Answer

从那以后，我了解到子集 a 的“正确”方法big.matrix是使用，sub.big.matrix尽管这仅适用于连续的列和/或行。当前未实施非连续子集。

sm <- sub.big.matrix(big, firstCol=1, lastCol=2)

score 0 · Accepted Answer

如果不调用as.big.matrix子集，这似乎是不可能的。

从big.matrix文档中，

如果x是一个 big.matrix，则x[1:5,]作为包含的前五行的 R 矩阵返回x。

我认为这也适用于列。所以看来你需要打电话

a <- as.big.matrix(big[,1:2])

为了使子集也成为一个big.matrix 对象。

class(a)
# [1] "big.matrix"
# attr(,"package")
# [1] "bigmemory"

r - 子集 big.matrix 的正确方法

2 回答 2

Related

Reference