1

尝试使用 sbatch 提交作业时出现以下错误:

An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).

当我使用不带参数的 sbatch 时,它运行良好,但是当我尝试使用 sbatch 传递任何参数(例如--job-nameor --export)时,会出现上述错误。

我正在使用 openmpi 3 并使用 mpirun 运行 python 脚本。mpirun 和 orted 似乎都在使用相同的 openmpi 版本,正如which在使用 mpirun 之前调用我的 slurm 脚本所证明的那样:

which mpirun: /opt/openmpi30/bin/mpirun
which orted: /opt/openmpi30/bin/orted

任何帮助将不胜感激。

4

0 回答 0