r - Matrix factorization with ALS has very slow performance with implicit zeros for large matrices in R

Question

I want to perform a Matrix factorization with alternating least squares (ALS) in R. While the code is working fine for small matrices, it is incredible slow for larger matrices. I would appreciate any help in speeding up the process. I am using RRopen 8.01, therefore it is already running on multiple cores using MKL.

I am utilizing a binary matrix as implicit feedback matrix. Furthermore I implemented a weighting matrix.

## Matrix Factorization with Alternating Least Squares
## R is u * v binary matrix,
## W is u * v weighting matrix
## U is u * k user feature matrix, 
## V is v * k item feature matrix
## u is the number of users,
## v is the number of items,
## k is the number of features
## iter is the number of iterations

Here is what I did:

# implicit feedback data matrix.
R <- matrix(nr=2, nc=5, data=rbinom(2*5,1, prob=.2))
W <- matrix(nr=2, nc=5, data=rbinom(2*5,7, prob=.2))

I set the following parameter:

k <- 20
its <- 10

Create the initial matrices for users and items

# initial users matrix.
U <- matrix(nr= nrow(R), nc=k, data=5 *rnorm(nrow(R)*k))
# initial items matrix.
V <- matrix(nr=k, nc=ncol(R), data=5* rnorm(ncol(R)*k))

And now I perform the Matrix Factorization with ALS

w.err <- NULL
for(iter in 1:its) {
  # update users
  for(i in 1:nrow(R)) {
    U[i,] <- t(solve(V %*% (diag(R[i,])%*% t(V)) + 0.1 * diag(k),
                     as.vector(V %*% as.vector(t(W[i,])%*% diag(R[i,])))))
  }
  # update items
  for(j in 1:ncol(R)){
    V[,j] <- solve(t(U) %*% (diag(R[,j]) %*% U) + 0.1 * diag(k),
                   t(U) %*% (diag(R[,j]) %*% W[, j]))
  }
  R.hat <- U %*% V
  w.err[iter] <- sum((R* (W-U%*%V))^2) 
}

R.hat is the desired end matrix. w.err is just a control for the errors over the iterations. Nice for plotting :)

The code as it is works fine. Just when I increase the number of rows and columns in R (and W), the performance decrease significantly. While it is fine for let's say nr=200, nr=500, it is already running for two hours for nr=2000, nr=5000(and not finished yet) on an 8 core 2.67 Ghz machine.

I didn't use the NMF or the NMFN package since negative values are possible, accordingly it is not an non-negative MF. Does anyone has an idea how to increase performance? Maybe I am just stupid an my code is nonsense, i would be happy if you could point out improvements.

I looked for similar questions but couldn't find one. Maybe I just overlooked it.

r - Matrix factorization with ALS has very slow performance with implicit zeros for large matrices in R

0 回答 0

Related

Reference