现有的 OpenMP 结构无法实现您想要的,只能手动实现。想象一下原来的并行循环是:
!$OMP DO
DO i = 1, 100
...
END DO
!$OMP END DO
自定义选择参与线程的修改版本将是:
USE OMP_LIB
INTEGER, DIMENSION(:), ALLOCATABLE :: threads
INTEGER :: tid, i, imin, imax, tidx
! IDs of threads that should execute the loop
! Make sure no repeated items inside
threads = (/ 0, 1, 3, 4 /)
IF (MAXVAL(threads, 1) >= omp_get_max_threads()) THEN
STOP 'Error: insufficient number of OpenMP threads'
END IF
!$OMP PARALLEL PRIVATE(tid,i,imin,imax,tidx)
! Get current thread's ID
tid = omp_get_thread_num()
...
! Check if current thread should execute part of the loop
IF (ANY(threads == tid)) THEN
! Find out what thread's index is
tidx = MAXLOC(threads, 1, threads == tid)
! Compute iteration range based on the thread index
imin = 1 + ((100-1 + 1)*(tidx-1))/SIZE(threads)
imax = 1 + ((100-1 + 1)*tidx)/SIZE(threads) - 1
PRINT *, 'Thread', tid, imin, imax
DO i = imin, imax
...
END DO
ELSE
PRINT *, 'Thread', tid, 'not taking part'
END IF
! This simulates the barrier at the end of the worksharing construct
! Remove in order to implement the "nowait" clause
!$OMP BARRIER
...
!$OMP END PARALLEL
以下是三个示例执行:
$ OMP_NUM_THREADS=2 ./custom_loop.x | sort
STOP Error: insufficient number of OpenMP threads
$ OMP_NUM_THREADS=5 ./custom_loop.x | sort
Thread 0 1 33
Thread 1 34 66
Thread 2 not taking part
Thread 3 not taking part
Thread 4 67 100
$ OMP_NUM_THREADS=7 ./custom_loop.x | sort
Thread 0 1 33
Thread 1 34 66
Thread 2 not taking part
Thread 3 not taking part
Thread 4 67 100
Thread 5 not taking part
Thread 6 not taking part
请注意,这是一个糟糕的 hack,违反了 OpenMP 模型的基本前提。我强烈建议不要这样做并依赖某些线程来执行代码的某些部分,因为它会创建高度不可移植的程序并阻碍运行时优化。
如果您决定放弃显式分配应该执行循环的线程的想法并且只想动态更改线程数,那么SCHEDULE
子句中的块大小参数是您的朋友:
!$OMP PARALLEL
...
! 2 threads = 10 iterations / 5 iterations/chunk
!$OMP DO SCHEDULE(static,5)
DO i = 1, 10
PRINT *, i, omp_get_thread_num()
END DO
!$OMP END DO
...
! 10 threads = 10 iterations / 1 iteration/chunk
!$OMP DO SCHEDULE(static,1)
DO i = 1, 10
PRINT *, i, omp_get_thread_num()
END DO
!$OMP END DO
...
!$OMP END PARALLEL
以及 10 个线程的输出:
$ OMP_NUM_THREADS=10 ./loop_chunks.x | sort_manually :)
First loop
Iteration Thread ID
1 0
2 0
3 0
4 0
5 0
6 1
7 1
8 1
9 1
10 1
Second loop
Iteration Thread ID
1 0
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
10 9