2

I'm looking for a fast NMF implementation for sparse matrices in R.

The R NMF package consists of a number of algorithms, none of which impress in terms of computational time.

NNLM::nnmf() seems state of the art in R at the moment, specifically the method = "scd" and loss = "mse", implemented as alternating least squares solved by sequential coordinate descent. However, this method is quite slow on very large, very sparse matrices.

The rsparse::WRMF function is extremely fast, but that's due to the fact that only positive values in A are used for row-wise computation of W and H.

Is there any reasonable implementation for solving NMF on a sparse matrix?

Is there an equivalent to scikit-learn in R? See this question

There are various worker functions, such as fnnls, tsnnls in R, none of which surpass nnls::nnls (written in Fortran). I have been unable to code any of these functions into a faster NMF framework.

4

1 回答 1

3

Forgot I even posted this question, but one year later...

I wrote a very fast implementation of NMF in RcppEigen, see the RcppML R package on CRAN.

install.packages("RcppML")

# for the development version
devtools::install_github("zdebruine/RcppML")

?RcppML::nmf

It's at least an order of magnitude faster than NNLM::nnmf and for comparison, RcppML::nmf rivals the runtime of irlba::irlba SVD (although it's an altogether different algorithm).

I've successfully applied my implementation to 1.3 million single-cells containing 26000 genes in a 96% sparse matrix for rank-100 factorization in 1 minute. I think that's very reasonable.

于 2021-08-02T16:49:44.483 回答