fortran90 - BiCG algorithm in Fortran not working properly?

Question

I'm working on a Bi-Conjugate Gradient algorithm in Fortran and have it fully code, following the algorithm in Saad, Y. "Iterative Methods for Sparse Linear Systems" (the plain BiCG method). However, it is not converging in the required number of iterations, nor is it returning the correct results.

The algorithm is given as in the "Unpreconditioned version" on Wikipedia (http://en.wikipedia.org/wiki/Biconjugate_gradient_method#Unpreconditioned_version_of_the_algorithm)

I am still relatively new to Fortran, and do not understand why exactly this is not behaving as expected because as far as I know its coded exactly as specified. If someone sees any unorthodox code, or faults in the algorithm I would be very grateful!

I have included a test matrix for simplicity:

    !
    !////////////////////////////////////////////////////////////////////////
    !
    !      BiCG_main.f90
    !      Created: 19 February 2013 12:01
    !      By: Robin  Fox  
    !
    !////////////////////////////////////////////////////////////////////////
    !
    PROGRAM bicg_main
    !
    IMPLICIT NONE 
    !-------------------------------------------------------------------
    ! Program to implement the Bi-Conjugate Gradient method
    ! follows algorithm in Saad
    !-------------------------------------------------------------------
    !
    COMPLEX(KIND(0.0d0)), DIMENSION(:,:), ALLOCATABLE       ::A
    COMPLEX(KIND(0.0d0)), DIMENSION(:), ALLOCATABLE         ::b
    COMPLEX(KIND(0.0d0)), DIMENSION(:), ALLOCATABLE         ::x0, x0s
    COMPLEX(KIND(0.0d0)), DIMENSION(:), ALLOCATABLE         ::x, xs
    COMPLEX(KIND(0.0d0)), DIMENSION(:), ALLOCATABLE         ::p, ps 
    COMPLEX(KIND(0.0d0))                                    ::alpha, rho0, rho1, r_rs
    COMPLEX(KIND(0.0d0)), DIMENSION(:), ALLOCATABLE         ::r,rs, res_vec
    COMPLEX(KIND(0.0d0)), DIMENSION(:), ALLOCATABLE         ::Ax, ATx
    COMPLEX(KIND(0.0d0)), DIMENSION(:), ALLOCATABLE         ::Ap, Aps
    COMPLEX(KIND(0.0d0))                                    ::beta
    !
    REAL(KIND(0.0d0))                                       ::tol,res, n2b, n2r0, rel_res
    !
    INTEGER                                                 ::n,i,j,k, maxit
    !//////////////////////////////////////////////////////////////////////// 

    !----------------------------------------------------------   
    n=2
    ALLOCATE(A(n,n))
    ALLOCATE(b(n))

    A(1,1)=CMPLX(-0.73492,7.11486)
    A(1,2)=CMPLX(0.024839,4.12154)
    A(2,1)=CMPLX(0.274492957,3.7885537)
    A(2,2)=CMPLX(-0.632557864,1.95397735)

    b(1)=CMPLX(0.289619736,0.895562183)
    b(2)=CMPLX(-0.28475616,-0.892163111)

    !---------------------------------------------------------- 

    ALLOCATE(x0(n))
    ALLOCATE(x0s(n))

    !Use all zeros initial guess
    x0(:)=CMPLX(0.0d0,0.0d0)
    DO i=1,n
       x0s(i)=CONJG(x0(i))
    END DO 

    ALLOCATE(Ax(n))
    ALLOCATE(ATx(n))
    ALLOCATE(x(n))
    ALLOCATE(xs(n))

    ! Multiply matrix A with vector x0
    DO i=1,n
       Ax(i)=CMPLX(0.0,0.0)
       DO j=1,n
          Ax(i)=Ax(i)+A(i,j)*x0(j) !==Ax=A*x0
       END DO 
    END DO    

    ! Multiply matrix A^T with vector x0
    DO i=1,n
       ATx(i)=CMPLX(0.0,0.0)
       DO j=1,n
          ATx(i)=ATx(i)+CONJG(A(j,i))*x0s(j) !==A^Tx=A^T*x0
       END DO 
    END DO    

    res=0.0d0
    n2b=0.0d0
    x=x0

    ALLOCATE(r(n))
    ALLOCATE(rs(n))
    ALLOCATE(p(n))
    ALLOCATE(ps(n))

    !Initialise
    DO i=1,n
       r(i)=b(i)-Ax(i) 
       rs(i)=CONJG(b(i))-ATx(i)
       p(i)=r(i) !p0=r0
       ps(i)=rs(i) !p0s=r0s
    END DO 

    DO i=1,n
       n2b=n2b+(b(i)*CONJG(b(i)))
       res=res+(r(i)*CONJG(r(i))) !== inner prod(r,r)
    END DO 
    n2b=SQRT(n2b)
    res=SQRT(res)/n2b

    !Check that inner prod(r,rs) =/= 0
    n2r0=0.0d0
    DO i=1,n
       n2r0=n2r0+r(i)*CONJG(rs(i))
    END DO 

    IF (n2r0==0) THEN 
       res=1d-20 !set tol so that loop doesn't run (i.e. already smaller than tol)
       PRINT*, "Inner product of r, rs == 0"
    END IF 
    WRITE(*,*) "n2r0=", n2r0

    !---------------------------------------------------------- 
    ALLOCATE(Ap(n))
    ALLOCATE(Aps(n))
    ALLOCATE(res_vec(n))

    tol=1d-6
    maxit=50 !for n=720

    k=0
    !Main loop:
    main: DO WHILE ((res>tol).AND.(k<maxit))

       k=k+1
       ! Multiply matrix A with vector p
       DO i=1,n
          Ap(i)=CMPLX(0.0,0.0)
          DO j=1,n
             Ap(i)=Ap(i)+A(i,j)*p(j)
          END DO 
       END DO    

       ! Multiply matrix A^T with vector p
       ! N.B. transpose is also conjg.
       DO i=1,n
          Aps(i)=CMPLX(0.0,0.0)
          DO j=1,n
             Aps(i)=Aps(i)+CONJG(A(j,i))*ps(j)
          END DO 
       END DO  

       rho0=CMPLX(0.0d0,0.0d0)
       DO i=1,n
          rho0=rho0+(r(i)*CONJG(rs(i)))
       END DO 
       WRITE(*,*) "rho0=", rho0

       rho1=CMPLX(0.0d0,0.0d0)
       DO i=1,n
          rho1=rho1+(Ap(i)*CONJG(ps(i)))
       END DO 
       WRITE(*,*) "rho1=", rho1

       !Calculate alpha:
       alpha=rho0/rho1
       WRITE(*,*) "alpha=", alpha

       !Update solution
       DO i=1,n
          x(i)=x(i)+alpha*p(i)
       END DO    

       !Update residual:
       DO i=1,n
          r(i)=r(i)-alpha*Ap(i)
       END DO 

       !Update second residual:
       DO i=1,n
          rs(i)=rs(i)-alpha*Aps(i)
       END DO 

       !Calculate beta:
       r_rs=CMPLX(0.0d0,0.0d0)
       DO i=1,n
          r_rs=r_rs+(r(i)*CONJG(rs(i)))
       END DO 
       beta=r_rs/rho0

       !Update direction vectors:
       DO i=1,n
          p(i)=r(i)+beta*p(i)
       END DO 

       DO i=1,n
          ps(i)=rs(i)+beta*ps(i)
       END DO 

       !Calculate residual for convergence check
    !   res=0.0d0
    !   DO i=1,n
    !      res=res+(r(i)*CONJG(r(i))) !== inner prod(r,r)
    !   END DO 
    !---------------------------------------------------------- 
       !Calculate updated residual "res_vec=b-A*x" relative to current x
       DO i=1,n
          Ax(i)=CMPLX(0.0d0, 0.0d0)
          DO j=1,n
                  Ax(i)=Ax(i)+A(i,j)*x(j)
          END DO 
       END DO   

       DO i=1,n
          res_vec(i)=b(i)-Ax(i)
       END DO 

       DO i=1,n
          rel_res=rel_res+(res_vec(i)*CONJG(res_vec(i)))
       END DO 
       res=SQRT(res)/REAL(n2b)
       WRITE(*,*) "res=",res
       WRITE(*,*) " "

    END DO main    
    !----------------------------------------------------------   
    !Output message
    IF (k<maxit) THEN 
       WRITE(*,*) "Converged in",k,"iterations"
    ELSE 
       WRITE(*,*) "STOPPED after",k, "iterations because max no. of iterations was reached"
    END IF 

    !Output solution vector:
    WRITE(*,*) "x_sol="
    DO i=1,n
       WRITE(*,*) x(i) 
    END DO  

    !----------------------------------------------------------   
    DEALLOCATE(x0,x0s, Ax, ATx, x, xs, p, ps ,r, rs, Ap, Aps, res_vec)
    DEALLOCATE(A,b)
    !
    END PROGRAM 
    !
    !////////////////////////////////////////////////////////////////////////

RESULTS: The results to my script are given as:

      STOPPED after          50 iterations because max no. of iterations was reached
     x_sol=
     (-2.88435711452590705E-002,-0.43229898544084933     )
     ( 0.11755325208241280     , 0.73895038053993978     )

while the actual results are given using MATLAB's built in bicg.m function as:

       -0.3700 - 0.6702i
        0.7295 + 1.1571i

score 3 · Accepted Answer

这是您的程序中的一些瑕疵。它们是否是错误有点主观，是否修改代码完全取决于您。

在这一行
```
IF (n2r0==0) THEN 
```
您测试（可能长时间运行）循环的结果总和是否正好为 0。这对于浮点数总是一个坏主意。如果您不知道这一点，请查看这里关于 SO 的许多问题，这些问题floating-point是由于对 fp 算术的合理预期的普遍不精确而引起的。我不认为您在比较左侧使用实数和在右侧使用整数会使事情变得更糟，但这并没有使它们变得更好。
在您的代码中至少有两个地方计算矩阵向量乘积。您可以将这些循环替换为对内部matmul例程的调用（我认为，我没有像我确定的那样仔细检查您的代码）。这实际上可能会减慢您的代码，但这不是现阶段的问题。调用经过良好测试的库例程而不是自己滚动将（a）减少您必须维护/测试/修复的代码量，并且（b）更有可能提供一次正确的解决方案。一旦你的代码工作了，如果你必须担心性能。
您声明了许多具有双精度的实数和复数变量，但使用以下语句初始化它们：
```
A(1,1)=CMPLX(-0.73492,7.11486)
```
双精度变量有大约 15 个十进制数字可用，但在这里您只提供前 6 个数字的值。您不能依赖编译器将其他数字设置为任何特定值。相反，像这样初始化：
```
A(1,1)=CMPLX(-0.73492_dp,7.11486_dp)
```
这将导致这些值被初始化为最接近-0.73492和的双精度数7.11486。当然，你必须以前写过类似的东西dp = kind(0d0)，还有其他方法可以强制文字常量的精度，但这是我通常这样做的方式。如果您有一个提供内部iso_fortran_env模块的现代 Fortran 编译器，您可以将替换_dp为 now-standard _real64。
这段代码
```
x0(:)=CMPLX(0.0d0,0.0d0)
DO i=1,n
   x0s(i)=CONJG(x0(i))
END DO 
```
可以替换为
```
x0  = CMPLX(0.0d0,0.0d0)
x0s = x0
```
使用数组语法将第一个数组归零，然后使用循环将第二个数组归零似乎有点奇怪；CONJG时反复调用似乎更加奇特CONJG(0,0)==(0,0)。
这段代码
```
  DO i=1,n
     n2b=n2b+(b(i)*CONJG(b(i)))
     res=res+(r(i)*CONJG(r(i))) !== inner prod(r,r)
  END DO 
  n2b=SQRT(n2b)
  res=SQRT(res)/n2b
```
如果我理解正确，可以替换为
```
 n2b = sqrt(dot_product(b,b)) 
 res = sqrt(dot_product(r,r))/n2b
```
我实际上并没有看到您的代码有任何问题，但是使用内部函数可以减少您需要编写和维护的行数，就像matmul上面的情况一样。

可能还有其他不那么明显的瑕疵，但这很多应该让你开始。

fortran90 - BiCG algorithm in Fortran not working properly?

1 回答 1

Related

Reference