我正在测试由另一个用户提供和编译的 OpenMPI,(我正在使用指向他的所有 bin、include 等目录的软链接 - 所有必需目录)但我遇到了这个奇怪的事情:
首先,如果我使用 -n 设置 <= 10 运行 mpirun,我可以在下面运行它。testrunmpi.py 只是打印出“运行”。从每个核心。
# I am in serverA.
bash-3.2$ /home/karl/bin/mpirun -n 10 ./testrunmpi.py
run.
run.
run.
run.
run.
run.
run.
run.
run.
run.
但是,当我尝试运行 -n 超过 10 时,我会遇到:
bash-3.2$ /home/karl/bin/mpirun -n 24 ./testrunmpi.py
karl@serverB's password: Could not chdir to home directory /home/karl: No such file or directory
bash: /home/karl/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 19203) died unexpectedly with status 127 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
bash-3.2$
bash-3.2$
Permission denied, please try again.
karl@serverB's password:
Permission denied, please try again.
karl@serverB's password:
我看到工作被分派到 serverB,而我在 serverA 上。我在 serverB 上没有任何帐户。但是如果我调用 mpirun -n <= 10,工作将在 serverA 上。
这很奇怪,所以我检查了 /home/karl/etc/openmpi-default-hostfile,并尝试设置以下内容:
serverA slots=24 max_slots=24
serverB slots=0 max_slots=32
但是问题仍然存在,并且仍然发出与上面相同的错误消息。为了让我的程序仅在 serverA 上运行,我必须做什么?