-1

I am using the PCAFast method from the MLPY API in python (http://mlpy.sourceforge.net/docs/3.2/dim_red.html)

The method is executed pretty fast when it learns a feature matrix generated as follows:

x = np.random.rand(100, 100)

Sample output of this command is:

[[ 0.5488135   0.71518937  0.60276338 ...,  0.02010755  0.82894003
   0.00469548]
 [ 0.67781654  0.27000797  0.73519402 ...,  0.25435648  0.05802916
   0.43441663]
 [ 0.31179588  0.69634349  0.37775184 ...,  0.86219152  0.97291949
   0.96083466]
 ..., 
 [ 0.89111234  0.26867428  0.84028499 ...,  0.5736796   0.73729114
   0.22519844]
 [ 0.26969792  0.73882539  0.80714479 ...,  0.94836806  0.88130699
   0.1419334 ]
 [ 0.88498232  0.19701397  0.56861333 ...,  0.75842952  0.02378743
   0.81357508]]

However when the feature matrix x consists of data such as the following:

x = 7.55302582e-05*np.ones((n, d[i]))

Sample output:

[[  7.55302582e-05   7.55302582e-05   7.55302582e-05 ...,   7.55302582e-05
    7.55302582e-05   7.55302582e-05]
 [  7.55302582e-05   7.55302582e-05   7.55302582e-05 ...,   7.55302582e-05
    7.55302582e-05   7.55302582e-05]
 [  7.55302582e-05   7.55302582e-05   7.55302582e-05 ...,   7.55302582e-05
    7.55302582e-05   7.55302582e-05]
 ..., 
 [  7.55302582e-05   7.55302582e-05   7.55302582e-05 ...,   7.55302582e-05
    7.55302582e-05   7.55302582e-05]
 [  7.55302582e-05   7.55302582e-05   7.55302582e-05 ...,   7.55302582e-05
    7.55302582e-05   7.55302582e-05]
 [  7.55302582e-05   7.55302582e-05   7.55302582e-05 ...,   7.55302582e-05
    7.55302582e-05   7.55302582e-05]]

The method becomes very very slow... Why does this happen ? Does this have something to do with the type of the data stored in the x feature matrix ?

Any ideas on how to solve this ?

4

1 回答 1

0

This is a terrible (poorly conditioned) matrix to run principal components analysis on. It has an eigenvalue of zero (which by itself may be problematic), and the remaining eigenvalues of 1 (you can subtract rows from one another to get a degenerate matrix). Python may have a poor implementation of eigensystem solver that relies on the matrix being reasonably regular (all eigenvalues are distinct and sufficiently well separated from zero and from each other). I am not familiar with the method, but my feeling, based on the title of Fast Fixed Point, is that they rely on the properties of multiplication of blowing up the eigenvalues: if Ak λk uk uk' for the appropriate orthogonal vectors uk, and λ1 > λ2 > … > λp > 0, then Anλ1 u1 u1' for sufficient large power n. This idea simply does not work when you feed an array of ones as inputs: you just keep getting the top eigenvalue of one that does not separate from others. Worse, for the specific matrix that you feed in, (7·10-5)20 gets close to what can be represented as a double precision number. In the end, you may have a complete crap in the output. There are appropriate computational linear algebra methods that are much more computationally stable and reliable. A decision to implement one or the other method is the developer's judgment call; besides the fast part of it, one would also need to think about how robust the method is. No offense, but I would take a stable slow method over a quick and dirty method on most occasions.

于 2015-07-16T16:38:51.097 回答