我目前正在加速 Fortran 代码,其中包含的子例程 ( subsub
) 访问和修改父子例程 ( sub
) 中声明的变量:
module mod
implicit none
contains
subroutine sub
integer :: var(10)
integer :: i
!$acc kernels loop
do i = 1, 10
call subsub
enddo
contains
subroutine subsub
!$acc routine
var(i) = i
endsubroutine
endsubroutine
endmodule
program test
use mod
call sub
endprogram
使用 PGI 编译器版本 20.9-0 编译时,它抱怨subsub
无法引用主机变量var
:
sub:
8, Generating implicit copy(.S0000) [if not already present]
9, Loop is parallelizable
Generating Tesla code
9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
NVFORTRAN-S-0155-acc routine cannot be used for contained subprograms that refer to host subprogram data: var (test.f90)
0 inform, 0 warnings, 1 severes, 0 fatal for subsub
这是有道理的。我尝试使用orvar
在设备上创建,但它不会改变结果。acc data create(var)
acc declare create(var)
这种模式可以加速吗?