r - 在 R 中执行非负矩阵分解

Question

我在 R 中有一个稀疏矩阵

我现在希望对 R 执行非负矩阵分解

data.txt 是我使用 python 创建的一个文本文件，它由 3 列组成，其中第一列指定行号，第二列指定列号，第三列指定值

数据.txt

1 5 10
3 2 5
4 6 9

原始 data.txt 包含 164009 行，这是 250000x250000 稀疏矩阵的数据

我使用了 NMF 库，我正在做

x=scan('data.txt',what=list(integer(),integer(),numeric()))
library('Matrix')
R=sparseMatrix(i=x[[1]],j=x[[2]],x=x[[3]]) 
res<-nmf(R,3)

它给了我一个错误：

函数错误（类、fdef、mtable）：无法找到函数 nmf 的继承方法，用于签名“dgCMAtrix”、“missing”、“missing”

谁能帮我弄清楚我做错了什么？

score 4 · Accepted Answer

第一个问题是您正在向 nmf 提供 dgCMatrix。

> class(R)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"

帮助在这里：

help(nmf)

请参阅方法部分。它需要一个真实的矩阵。由于条目的数量，使用 as.matrix 进行强制可能对您没有多大帮助。

现在，即使使用您的示例数据，对矩阵的强制也是不够的：

> nmf(as.matrix(R))
Error: NMF::nmf : when argument 'rank' is not provided, argument 'seed' is required to inherit from class 'NMF'. See ?nmf.

让我们给它一个排名：

> nmf(as.matrix(R),2)
Error in .local(x, rank, method, ...) : 
  Input matrix x contains at least one null row.

确实如此：

> R
4 x 6 sparse Matrix of class "dgCMatrix"

[1,] . . . . 10 .
[2,] . . . .  . .
[3,] . . 5 .  . .
[4,] . . . .  . 9

score 1 · Accepted Answer

现在有一个优秀的 NMF 包可用： https ://cran.r-project.org/web/packages/NMF/NMF.pdf

提供各种热图、纯度/熵、一系列不同的 NMF 算法（Brunet、Lee、sNMF、nsNMF、欧几里得/KL 散度等）以及创建您自己的框架。

尝试：

library(NMF)
x = read.table('data.txt')
# estimate rank
estim.x = nmf(x, 2:5, nrun=50, method = 'nsNMF', seed = 'random', .options = "v")
# plot clustering accuracy
plot(estim.x, what = c("cophenetic", "dispersion"))
# inspect consensus matrices
consensusmap(estim.x)

score 1 · Accepted Answer

差不多 10 年后，有解决方案。这是一个快速的。

如果你有一个dgCMatrix250k 平方的 dgCMatrix，它的稀疏度接近 1%，你需要一个稀疏分解算法。

我RcppML::NMF为大型稀疏矩阵写了：

library(RcppML)
A <- rsparsematrix(1000, 10000, 0.01)
model <- RcppML::nmf(A, k = 10)
str(model)

在笔记本电脑上这应该需要几秒钟。

您也可以签出rsparse::WRMF，尽管速度不快。

r - 在 R 中执行非负矩阵分解

3 回答 3

Related

Reference