python - Python 循环未在 SLURM 中运行

翻译自：https://stackoverflow.com/questions/65010786 2020-11-25T18:35:13.823

47 次

我有一个使用根据 MPI 并行化的函数（我们称之为 integrator_MPI）的 python 代码。我通过提交相关行的作业在 HPCC 中执行此代码：

#!/bin/bash
#SBATCH --job-name=Job        # create a short name for your job
#SBATCH --cpus-per-task=12       # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=5G         # memory per cpu-core (4G is default)
#SBATCH --time=60:00:00          # total run time limit (HH:MM:SS)


export IMPIPMI_LIBRARY=/usr/lib64/libpmi.so #This is required by the cluster
export IMPIFABRICS=shm:ofa                  #This is required by the cluster

mpirun -n 12  python3 my_code.py

通过这样做，代码可以正常工作。

但是当我修改我的 python 代码以便多次调用 integrator_MPI() 函数时，

for i in range(Ntimes):
    ####______ code block that alters the input/data___ ###############
    #...
    integrator_MPI()

只有在计算中的第一次迭代并且代码永远不会停止运行。这只发生在我尝试在集群上运行它时。在我的笔记本电脑上，循环工作正常。

我应该在作业文件中编写循环还是有办法扭转这种情况？

python - Python 循环未在 SLURM 中运行

0 回答 0

Related

Reference