matlab - 是否可以使用预先计算的分解来加速具有稀疏矩阵的反斜杠\mldivide

Question

我执行求解线性方程组的多次迭代：Mx=b使用大而稀疏的 M。M 在迭代之间不会改变，但 b 会改变。我尝试了几种方法，到目前为止发现反斜杠\mldivide 是最有效和最准确的。

以下代码与我正在做的非常相似：

for ii=1:num_iter
  x = M\x;
  x = x+dx;
end

现在我想通过利用 M 是固定的事实来进一步加速计算。

设置标志spparms('spumoni',2)允许求解器算法的详细信息。

我运行了以下代码：

spparms('spumoni',2);
x = M\B;

输出（监控）：

sp\: bandwidth = 2452+1+2452.
sp\: is A diagonal? no.
sp\: is band density (0.01) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKage with Control parameters:
UMFPACK V5.4.0 (May 20, 2009), Control:
    Matrix entry defined as: double
    Int (generic integer) defined as: UF_long

    0: print level: 2
    1: dense row parameter:    0.2
        "dense" rows have    > max (16, (0.2)*16*sqrt(n_col) entries)
    2: dense column parameter: 0.2
        "dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
    3: pivot tolerance: 0.1
    4: block size for dense matrix kernels: 32
    5: strategy: 0 (auto)
    6: initial allocation ratio: 0.7
    7: max iterative refinement steps: 2
    12: 2-by-2 pivot tolerance: 0.01
    13: Q fixed during numerical factorization: 0 (auto)
    14: AMD dense row/col parameter:    10
       "dense" rows/columns have > max (16, (10)*sqrt(n)) entries
        Only used if the AMD ordering is used.
    15: diagonal pivot tolerance: 0.001
        Only used if diagonal pivoting is attempted.
    16: scaling: 1 (divide each row by sum of abs. values in each row)
    17: frontal matrix allocation ratio: 0.5
    18: drop tolerance: 0
    19: AMD and COLAMD aggressive absorption: 1 (yes)

    The following options can only be changed at compile-time:
    8: BLAS library used:  Fortran BLAS.  size of BLAS integer: 8
    9: compiled for MATLAB
    10: CPU timer is ANSI C clock (may wrap around).
    11: compiled for normal operation (debugging disabled)
    computer/operating system: Microsoft Windows
    size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)

sp\: is UMFPACK's symbolic LU factorization (with automatic reordering) successful? yes.
sp\: is UMFPACK's numeric LU factorization successful? yes.
sp\: is UMFPACK's triangular solve successful? yes.
sp\: UMFPACK Statistics:
UMFPACK V5.4.0 (May 20, 2009), Info:
    matrix entry defined as:          double
    Int (generic integer) defined as: UF_long
    BLAS library used: Fortran BLAS.  size of BLAS integer: 8
    MATLAB:                           yes.
    CPU timer:                        ANSI clock ( ) routine.
    number of rows in matrix A:       3468
    number of columns in matrix A:    3468
    entries in matrix A:              60252
    memory usage reported in:         16-byte Units
    size of int:                      4 bytes
    size of UF_long:                  8 bytes
    size of pointer:                  8 bytes
    size of numerical entry:          8 bytes

    strategy used:                    symmetric
    ordering used:                    amd on A+A'
    modify Q during factorization:    no
    prefer diagonal pivoting:         yes
    pivots with zero Markowitz cost:               1284
    submatrix S after removing zero-cost pivots:
        number of "dense" rows:                    0
        number of "dense" columns:                 0
        number of empty rows:                      0
        number of empty columns                    0
        submatrix S square and diagonal preserved
    pattern of square submatrix S:
        number rows and columns                    2184
        symmetry of nonzero pattern:               0.904903
        nz in S+S' (excl. diagonal):               62184
        nz on diagonal of matrix S:                2184
        fraction of nz on diagonal:                1.000000
    AMD statistics, for strict diagonal pivoting:
        est. flops for LU factorization:           2.76434e+007
        est. nz in L+U (incl. diagonal):           306216
        est. largest front (# entries):            31329
        est. max nz in any column of L:            177
        number of "dense" rows/columns in S+S':    0
    symbolic factorization defragmentations:       0
    symbolic memory usage (Units):                 174698
    symbolic memory usage (MBytes):                2.7
    Symbolic size (Units):                         9196
    Symbolic size (MBytes):                        0
    symbolic factorization CPU time (sec):         0.00
    symbolic factorization wallclock time(sec):    0.00

    matrix scaled: yes (divided each row by sum of abs values in each row)
    minimum sum (abs (rows of A)):              1.00000e+000
    maximum sum (abs (rows of A)):              9.75375e+003

    symbolic/numeric factorization:      upper bound               actual      %
    variable-sized part of Numeric object:
        initial size (Units)                  149803               146332    98%
        peak size (Units)                    1037500               202715    20%
        final size (Units)                    787803               154127    20%
    Numeric final size (Units)                806913               171503    21%
    Numeric final size (MBytes)                 12.3                  2.6    21%
    peak memory usage (Units)                1083860               249075    23%
    peak memory usage (MBytes)                  16.5                  3.8    23%
    numeric factorization flops         5.22115e+008         2.59546e+007     5%
    nz in L (incl diagonal)                   593172               145107    24%
    nz in U (incl diagonal)                   835128               154044    18%
    nz in L+U (incl diagonal)                1424832               295683    21%
    largest front (# entries)                 348768                30798     9%
    largest # rows in front                      519                  175    34%
    largest # columns in front                   672                  177    26%

    initial allocation ratio used:                 0.309
    # of forced updates due to frontal growth:     1
    number of off-diagonal pivots:                 0
    nz in L (incl diagonal), if none dropped       145107
    nz in U (incl diagonal), if none dropped       154044
    number of small entries dropped                0
    nonzeros on diagonal of U:                     3468
    min abs. value on diagonal of U:               4.80e-002
    max abs. value on diagonal of U:               1.00e+000
    estimate of reciprocal of condition number:    4.80e-002
    indices in compressed pattern:                 13651
    numerical values stored in Numeric object:     295806
    numeric factorization defragmentations:        0
    numeric factorization reallocations:           0
    costly numeric factorization reallocations:    0
    numeric factorization CPU time (sec):          0.05
    numeric factorization wallclock time (sec):    0.00
    numeric factorization mflops (CPU time):       552.22

    solve flops:                                   1.78396e+006
    iterative refinement steps taken:              1
    iterative refinement steps attempted:          1
    sparse backward error omega1:                  1.80e-016
    sparse backward error omega2:                  0.00e+000
    solve CPU time (sec):                          0.00
    solve wall clock time (sec):                   0.00

    total symbolic + numeric + solve flops:        2.77385e+007

观察线条：

numeric factorization flops         5.22115e+008         2.59546e+007     5%
solve flops:                                   1.78396e+006
total symbolic + numeric + solve flops:        2.77385e+007

这表明 M 的因式分解花费了求解方程所需总时间的 2.59546e+007/2.77385e+007 = 93.6%。

我想在我的迭代之外提前计算分解，然后只运行最后一个阶段，这需要大约 6.5% 的 CPU 时间。

我知道如何计算分解 ( [L,U,P,Q,R] = lu(M);)，但我不知道如何将其输出用作求解器的输入。

我想本着以下精神经营一些事情：

[L,U,P,Q,R] = lu(M);
for ii=1:num_iter
  dx = solve_pre_factored(M,P,Q,R,x);
  x = x+dx;
end

有没有办法在 Matlab 中做到这一点？

score 3 · Accepted Answer

您必须问自己所有这些来自 LU 分解的矩阵是做什么的。

正如文档所述：

[L,U,P,Q,R] = lu(A) 返回单位下三角矩阵 L、上三角矩阵 U、置换矩阵 P 和 Q，以及对角缩放矩阵 R，使得 P*(R\A) Q = L U 用于稀疏非空 A。通常但并非总是如此，行缩放会导致更稀疏且更稳定的分解。语句 lu(A,'matrix') 返回相同的输出值。

因此，在更多的数学术语中，我们有PR^-1AQ = LUA = RP^-1LUQ^-1

然后x = M\x可以按以下步骤重写：

y = R^-1x
z = P y
u = L^-1z
v = U^-1u
w = Q v
x = w

要反转U，您可以使用which 将识别它们是三角形（和对角线L）矩阵 - 正如监控应该确认的那样，并为它们使用适当的平凡求解器。R\R

因此以更密集和 matlab 编写的方式：x = Q*(U\(L\(P*(R\x))));

正如您所问的，这样做将正是在求解器内部发生的事情\，只有一个分解。

但是，正如评论中所述，大量求逆运算N = M^-1一次会变得更快，然后只进行简单的矩阵向量乘法，这比上面解释的过程要简单得多。初始计算inv(M)时间更长并且有一些限制，因此这种权衡还取决于矩阵的属性。

matlab - 是否可以使用预先计算的分解来加速具有稀疏矩阵的反斜杠\mldivide

1 回答 1

Related

Reference