我遇到了一个奇怪的问题,该问题有效,但现在无效。
我在两台计算机之间运行一个带有 tau 分析的 OpenMPI 程序。似乎 mpirun 无法在远程主机上运行 tau_exec 程序,可能是权限问题?
cluster@master:~/software/mpi_in_30_source/test2$ mpirun -np 2 --hostfile hostfile -d tau_exec -v -T MPI,TRACE,PROFILE ./hello.exe
[master:19319] procdir: /tmp/openmpi-sessions-cluster@master_0/4568/0/0
[master:19319] jobdir: /tmp/openmpi-sessions-cluster@master_0/4568/0
[master:19319] top: openmpi-sessions-cluster@master_0
[master:19319] tmp: /tmp
[slave2:06777] procdir: /tmp/openmpi-sessions-cluster@slave2_0/4568/0/1
[slave2:06777] jobdir: /tmp/openmpi-sessions-cluster@slave2_0/4568/0
[slave2:06777] top: openmpi-sessions-cluster@slave2_0
[slave2:06777] tmp: /tmp
[master:19319] [[4568,0],0] node[0].name master daemon 0 arch ff000200
[master:19319] [[4568,0],0] node[1].name slave2 daemon 1 arch ff000200
[slave2:06777] [[4568,0],1] node[0].name master daemon 0 arch ff000200
[slave2:06777] [[4568,0],1] node[1].name slave2 daemon 1 arch ff000200
[master:19319] Info: Setting up debugger process table for applications
MPIR_being_debugged = 0
MPIR_debug_state = 1
MPIR_partial_attach_ok = 1
MPIR_i_am_starter = 0
MPIR_proctable_size = 2
MPIR_proctable:
(i, host, exe, pid) = (0, master, /home/cluster/software/mpi_in_30_source/test2/tau_exec, 19321)
(i, host, exe, pid) = (1, slave2, /home/cluster/software/mpi_in_30_source/test2/tau_exec, 0)
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not find an executable:
Executable: tau_exec
Node: slave2
while attempting to start process rank 1.
--------------------------------------------------------------------------
[slave2:06777] sess_dir_finalize: job session dir not empty - leaving
[slave2:06777] sess_dir_finalize: job session dir not empty - leaving
[master:19319] sess_dir_finalize: job session dir not empty - leaving
[master:19319] sess_dir_finalize: proc session dir not empty - leaving
orterun: exiting with status -123
在 slave2 上:
cluster@slave2:~/software/mpi_in_30_source/test2$ tau_exec -T MPI,TRACE,PROFILE ./hello.exe
hello MPI user: from process = 0 on machine=slave2, of NCPU=1 processes
cluster@slave2:~/software/mpi_in_30_source/test2$ which tau_exec
/home/cluster/tools/tau-2.22.2/arm_linux/bin/tau_exec
所以两个节点上都有一个工作的 tau_exec。当我在没有 tau_exec 的情况下运行 mpirun 时,一切正常。
cluster@master:~/software/mpi_in_30_source/test2$ mpirun -np 2 --hostfile hostfile ./hello.exe
hello MPI user: from process = 0 on machine=master, of NCPU=2 processes
hello MPI user: from process = 1 on machine=slave2, of NCPU=2 processes