0

我目前正在加速 Fortran 代码,其中我在 subroutine 中有一个主要的加速循环sub。在循环中,我想subsub在设备上调用子程序acc routine。子例程有一个intent(out)参数val,它在循环中是私有的。与subsub循环本身一样,我想使用该vector子句:

module calc
  implicit none
  public :: sub
  private
contains
  subroutine sub()
    integer :: i
    integer :: array(10)
    integer :: val
    !$acc kernels loop independent private(val)
    do i = 1, 10
      call subsub(val)
      array(i) = val
    enddo
    print "(10(i0, x))", array
  endsubroutine
  subroutine subsub(val)
    !$acc routine vector
    integer, intent(out) :: val
    integer :: i
    val = 0
    !$acc loop independent reduction(+:val)
    do i = 1, 10
      val = val + 1
    enddo
  endsubroutine
endmodule

program test                                               
  use calc, only: sub                                
  implicit none                                      
  call sub()                                         
endprogram                                                

使用 PGI 编译器版本 20.9-0 编译并运行程序时,我在 variable 中得到乱码值array。当我简单地使用acc routineforsubsub时,我得到了正确的行为(在 的所有值中为 10 array)。我并行化这个子例程的方法有什么问题?

4

1 回答 1

2

它看起来确实像是一个关于如何在主循环中处理 val 的编译器代码生成问题。幸运的是,解决方法很简单,只需在主循环中添加 val 的安装即可。

% cat test.f90
module calc
  implicit none
  public :: sub
  private
contains
  subroutine sub()
    integer :: i
    integer :: array(10)
    integer :: val
    !$acc kernels loop independent private(val)
    do i = 1, 10
      val = 0
      call subsub(val)
      array(i) = val
    enddo
    print "(10(i0, x))", array
  endsubroutine
  subroutine subsub(val)
    !$acc routine vector
    integer, intent(out) :: val
    integer :: i
    val = 0
    !$acc loop independent reduction(+:val)
    do i = 1, 10
      val = val + 1
    enddo
  endsubroutine
endmodule

program test
  use calc, only: sub
  implicit none
  call sub()
endprogram
% nvfortran -acc -Minfo=accel test.f90 -V20.9 ; a.out
sub:
     10, Generating implicit copyout(array(:)) [if not already present]
     11, Loop is parallelizable
         Generating Tesla code
         11, !$acc loop gang ! blockidx%x
subsub:
     18, Generating Tesla code
         24, !$acc loop vector ! threadidx%x
             Generating reduction(+:val)
             Vector barrier inserted for vector loop reduction
     24, Loop is parallelizable
10 10 10 10 10 10 10 10 10 10
于 2020-12-17T17:38:09.793 回答