I am getting different results when Randomized PCA
with sparse and dense matrices:
import numpy as np
import scipy.sparse as scsp
from sklearn.decomposition import RandomizedPCA
x = np.matrix([[1,2,3,2,0,0,0,0],
[2,3,1,0,0,0,0,3],
[1,0,0,0,2,3,2,0],
[3,0,0,0,4,5,6,0],
[0,0,4,0,0,5,6,7],
[0,6,4,5,6,0,0,0],
[7,0,5,0,7,9,0,0]])
csr_x = scsp.csr_matrix(x)
s_pca = RandomizedPCA(n_components=2)
s_pca_scores = s_pca.fit_transform(csr_x)
s_pca_weights = s_pca.explained_variance_ratio_
d_pca = RandomizedPCA(n_components=2)
d_pca_scores = s_pca.fit_transform(x)
d_pca_weights = s_pca.explained_variance_ratio_
print 'sparse matrix scores {}'.format(s_pca_scores)
print 'dense matrix scores {}'.format(d_pca_scores)
print 'sparse matrix weights {}'.format(s_pca_weights)
print 'dense matrix weights {}'.format(d_pca_weights)
Result:
sparse matrix scores [[ 1.90912166 2.37266113]
[ 1.98826835 0.67329466]
[ 3.71153199 -1.00492408]
[ 7.76361811 -2.60901625]
[ 7.39263662 -5.8950472 ]
[ 5.58268666 7.97259172]
[ 13.19312194 1.30282165]]
dense matrix scores [[-4.23432815 0.43110596]
[-3.87576857 -1.36999888]
[-0.05168291 -1.02612363]
[ 3.66039297 -1.38544473]
[ 1.48948352 -7.0723618 ]
[-4.97601287 5.49128164]
[ 7.98791603 4.93154146]]
sparse matrix weights [ 0.74988508 0.25011492]
dense matrix weights [ 0.55596761 0.44403239]
The dense version gives the results with normal PCA, but what is going on when the matrix is sparse? Why are results different?