3

假设我有一个在 384 个 MPI 进程(24 个计算节点,每个计算节点有 16 个核心)上运行的代码,并使用以下简单脚本将我的作业提交到作业队列

#!/bin/bash
#PBS -S /bin/bash
#PBS -l nodes=24:ppn=16
#PBS -l walltime=01:00:00

cd $PBS_O_WORKDIR
module load openmpi
mpirun mycode > output_file

以下情况是否可能:我需要再分配一个具有 16 个内核的节点来使用“openmp”进行一些特定的计算,并在某个时候用计算结果更新其余 384 个进程。所以现在我有 384 个 MPI 进程,一个线程在每个进程上按顺序运行,一个 MPI 进程有 16 个 openmp 线程。

是否可以通过 OMP_NUM_THREADS 和 mpirun 或任何其他工具来完成此操作?

我很感激任何建议

谢谢

新浪

4

1 回答 1

5

You could request 25 nodes with 16 ppns and then force only 385 MPI processes:

#PBS -l nodes=25:ppn=16
...
mpirun -np 384 mycode : -np 1 -x OMP_NUM_THREADS=16 mycode > output_file

This utilises the MPMD launch mode of Open MPI with different launch configurations separated by colons. Since by default ranks are populated sequentially over node slots, the first 384 ranks will span exactly 24 nodes, then the additional rank will get started on the very last node. For it the OMP_NUM_THREADS environment variable will get set to 16 therefore enabling 16 OpenMP threads. If the OpenMP program is a different executable, just substitute its name in the second launch configuration, e.g.:

mpirun -np 384 mycode : -np 1 -x OMP_NUM_THREADS=16 myompcode > output_file
于 2013-07-02T12:12:21.700 回答