我想对大型(稀疏)矩阵执行奇异值分解。为了选择最好(最准确)的库,我尝试使用不同的 Java 和 Python 库复制此处提供的 SVD 示例。奇怪的是,我对每个库都得到了不同的结果。
这是原始示例矩阵,它是分解的(US 和 VT)矩阵:
A =2.0 0.0 8.0 6.0 0.0
1.0 6.0 0.0 1.0 7.0
5.0 0.0 7.0 4.0 0.0
7.0 0.0 8.0 5.0 0.0
0.0 10.0 0.0 0.0 7.0
U =-0.54 0.07 0.82 -0.11 0.12
-0.10 -0.59 -0.11 -0.79 -0.06
-0.53 0.06 -0.21 0.12 -0.81
-0.65 0.07 -0.51 0.06 0.56
-0.06 -0.80 0.09 0.59 0.04
VT =-0.46 0.02 -0.87 -0.00 0.17
-0.07 -0.76 0.06 0.60 0.23
-0.74 0.10 0.28 0.22 -0.56
-0.48 0.03 0.40 -0.33 0.70
-0.07 -0.64 -0.04 -0.69 -0.32
S (with the top three singular values) =
17.92 0 0
0 15.17 0
0 0 3.56
我尝试使用以下 Java 和 Python 库:Java:PColt、Jama Python:NumPy
以下是他们每个人的结果:
Jama:
U = 0.5423 -0.065 -0.8216 0.1057 -0.1245
0.1018 0.5935 0.1126 0.7881 0.0603
0.525 -0.0594 0.213 -0.1157 0.8137
0.6449 -0.0704 0.5087 -0.0599 -0.5628
0.0645 0.7969 -0.09 -0.5922 -0.0441
VT =0.4646 -0.0215 0.8685 8.0E-4 -0.1713
0.0701 0.76 -0.0631 -0.6013 -0.2278
0.7351 -0.0988 -0.284 -0.2235 0.565
0.4844 -0.0254 -0.3989 0.3327 -0.7035
0.065 0.6415 0.0443 0.6912 0.3233
S = 17.9184 0.0 0.0 0.0 0.0
0.0 15.1714 0.0 0.0 0.0
0.0 0.0 3.564 0.0 0.0
0.0 0.0 0.0 1.9842 0.0
0.0 0.0 0.0 0.0 0.3496
PColt:
U = -0.542255 0.0649957 0.821617 0.105747 -0.124490
-0.101812 -0.593461 -0.112552 0.788123 0.0602700
-0.524953 0.0593817 -0.212969 -0.115742 0.813724
-0.644870 0.0704063 -0.508744 -0.0599027 -0.562829
-0.0644952 -0.796930 0.0900097 -0.592195 -0.0441263
VT =-0.464617 0.0215065 -0.868509 0.000799554 -0.171349
-0.0700860 -0.759988 0.0630715 -0.601346 -0.227841
-0.735094 0.0987971 0.284009 -0.223485 0.565040
-0.484392 0.0254474 0.398866 0.332684 -0.703523
-0.0649698 -0.641520 -0.0442743 0.691201 0.323284
S =
(00) 17.91837085874625
(11) 15.17137188041607
(22) 3.5640020352605677
(33) 1.9842281528992616
(44) 0.3495556671751232
Numpy
U = -0.54225536 0.06499573 0.82161708 0.10574661 -0.12448979
-0.10181247 -0.59346055 -0.11255162 0.78812338 0.06026999
-0.52495325 0.05938171 -0.21296861 -0.11574223 0.81372354
-0.64487038 0.07040626 -0.50874368 -0.05990271 -0.56282918
-0.06449519 -0.79692967 0.09000966 -0.59219473 -0.04412631
VT =-4.64617e-01 2.15065e-02 -8.68508e-01 7.99553e-04 -1.71349e-01
-7.00859e-02 -7.59987e-01 6.30714e-02 -6.01345e-01 -2.27841e-01
-7.35093e-01 9.87971e-02 2.84008e-01 -2.23484e-01 5.65040e-01
-4.84391e-01 2.54473e-02 3.98865e-01 3.32683e-01 -7.03523e-01
-6.49698e-02 -6.41519e-01 -4.42743e-02 6.91201e-01 3.23283e-01
S = 17.91837086 15.17137188 3.56400204 1.98422815 0.34955567
可以注意到 Jama 分解矩阵 (u & VT) 中每个元素的符号与原始示例中的符号相反。有趣的是,对于 PColt 和 Numpy,只有最后两列元素的符号被反转。倒置标志背后有什么具体原因吗?有人遇到过类似的差异吗?
以下是我使用的代码片段:Java
import java.text.DecimalFormat;
import cern.colt.matrix.tdouble.DoubleMatrix2D;
import cern.colt.matrix.tdouble.algo.DenseDoubleAlgebra;
import cern.colt.matrix.tdouble.algo.decomposition.DenseDoubleSingularValueDecomposition;
import cern.colt.matrix.tdouble.impl.DenseDoubleMatrix2D;
import Jama.Matrix;
import Jama.SingularValueDecomposition;
public class SVD_Test implements java.io.Serializable{
public static void main(String[] args)
{
double[][] data2 = new double[][]
{{ 2.0, 0.0, 8.0, 6.0, 0.0},
{ 1.0, 6.0, 0.0, 1.0, 7.0},
{ 5.0, 0.0, 7.0, 4.0, 0.0},
{ 7.0, 0.0, 8.0, 5.0, 0.0},
{ 0.0, 10.0, 0.0, 0.0, 7.0}};
DoubleMatrix2D pColt_matrix = new DenseDoubleMatrix2D(5,5);
pColt_matrix.assign(data2);
Matrix j = new Matrix(data2);
SingularValueDecomposition svd_jama = j.svd();
DenseDoubleSingularValueDecomposition svd_pColt = new DenseDoubleSingularValueDecomposition(pColt_matrix, true, true);
System.out.println("U:");
System.out.println("pColt:");
System.out.println(svd_pColt.getU());
printJamaMatrix(svd_jama.getU());
System.out.println("S:");
System.out.println("pColt:");
System.out.println(svd_pColt.getS());
printJamaMatrix(svd_jama.getS());
System.out.println("V:");
System.out.println("pColt:");
System.out.println(svd_pColt.getV());
printJamaMatrix(svd_jama.getV());
}
public static void printJamaMatrix(Matrix inp){
System.out.println("Jama: ");
System.out.println(String.valueOf(inp.getRowDimension())+" X "+String.valueOf(inp.getColumnDimension()));
DecimalFormat twoDForm = new DecimalFormat("#.####");
StringBuffer sb = new StringBuffer();
for (int r = 0; r < inp.getRowDimension(); r++) {
for (int c = 0; c < inp.getColumnDimension(); c++)
sb.append(Double.valueOf(twoDForm.format(inp.get(r, c)))).append("\t");
sb.append("\n");
}
System.out.println(sb.toString());
}
}
Python:
>>> import numpy
>>> numpy_matrix = numpy.array([[ 2.0, 0.0, 8.0, 6.0, 0.0],
[1.0, 6.0, 0.0, 1.0, 7.0],
[5.0, 0.0, 7.0, 4.0, 0.0],
[7.0, 0.0, 8.0, 5.0, 0.0],
[0.0, 10.0, 0.0, 0.0, 7.0]])
>>> u,s,v = numpy.linalg.svd(numpy_matrix, full_matrices=True)
代码有问题吗?.