1

我注意到在我的 MPI 程序中 MPI_Finalize() 需要很长时间才能完成,大约 10/20 秒,而程序本身在几毫秒内完成(它几乎立即产生正确的结果)。

OPEMPI 手册http://www.open-mpi.org/doc/v1.6/man3/MPI_Finalize.3.php指出 MPI_Finalize() 应该只检查未决的通信。我推断如果某些通信不匹配或很快完成,它应该会失败。

MPI_Finalize 需要这么长时间才能完成的可能解释是什么?

更新:多次执行同一个程序时似乎会出现此问题,即 MPI_Finalize 的第一次执行通常很快,然后降级。即使对于像这样的非常简单的程序也很明显:

    #include <stdio.h>
    #include <mpi.h>


    int main (int argc,char* argv[])
    {
    int rank, size;

    MPI_Init (&argc, &argv);      /* starts MPI */
    MPI_Comm_rank (MPI_COMM_WORLD, &rank);        /* get current process id */
    MPI_Comm_size (MPI_COMM_WORLD, &size);        /* get number of processes */
    printf( "Hello world from process %d of %d\n", rank, size );
    MPI_Finalize();
    return 0;
    }

此外,这个问题似乎不受进程数量的影响。我在双插槽 Intel Xeon E5520 @ 2.27GHz 上遇到了这个问题。

更新 2

[andromeda.di.unipi.it:03918] procdir: /tmp/openmpi-sessions lottarin@andromeda.di.unipi.it_0/18136/0/0
[andromeda.di.unipi.it:03918] jobdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/0
[andromeda.di.unipi.it:03918] top: openmpi-sessions-lottarin@andromeda.di.unipi.it_0
[andromeda.di.unipi.it:03918] tmp: /tmp
[andromeda.di.unipi.it:03918] mpirun: reset PATH: /tmp/OPENMPI/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/lottarin/bin
[andromeda.di.unipi.it:03918] mpirun: reset LD_LIBRARY_PATH: /tmp/OPENMPI/lib
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received add_local_procs


MPIR_being_debugged = 0
  MPIR_debug_state = 1
  MPIR_partial_attach_ok = 1
  MPIR_i_am_starter = 0
  MPIR_forward_output = 0
  MPIR_proctable_size = 4
  MPIR_proctable:
    (i, host, exe, pid) = (0, andromeda.di.unipi.it, /home/lottarin/PROVE_MPI/./a.out, 3919)
    (i, host, exe, pid) = (1, andromeda.di.unipi.it, /home/lottarin/PROVE_MPI/./a.out, 3920)
    (i, host, exe, pid) = (2, andromeda.di.unipi.it, /home/lottarin/PROVE_MPI/./a.out, 3921)
    (i, host, exe, pid) = (3, andromeda.di.unipi.it, /home/lottarin/PROVE_MPI/./a.out, 3922)
MPIR_executable_path: NULL
MPIR_server_arguments: NULL
[andromeda.di.unipi.it:03920] procdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1/1
[andromeda.di.unipi.it:03920] jobdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1
[andromeda.di.unipi.it:03920] top: openmpi-sessions-lottarin@andromeda.di.unipi.it_0
[andromeda.di.unipi.it:03920] tmp: /tmp
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync+nidmap from local proc [[18136,1],1]
[andromeda.di.unipi.it:03919] procdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1/0
[andromeda.di.unipi.it:03919] jobdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1
[andromeda.di.unipi.it:03919] top: openmpi-sessions-lottarin@andromeda.di.unipi.it_0
[andromeda.di.unipi.it:03919] tmp: /tmp
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync+nidmap from local proc [[18136,1],0]
[andromeda.di.unipi.it:03920] [[18136,1],1] node[0].name andromeda daemon 0
[andromeda.di.unipi.it:03919] [[18136,1],0] node[0].name andromeda daemon 0
[andromeda.di.unipi.it:03922] procdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1/3
[andromeda.di.unipi.it:03922] jobdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1
[andromeda.di.unipi.it:03922] top: openmpi-sessions-lottarin@andromeda.di.unipi.it_0
[andromeda.di.unipi.it:03922] tmp: /tmp
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync+nidmap from local proc [[18136,1],3]
[andromeda.di.unipi.it:03921] procdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1/2
[andromeda.di.unipi.it:03921] jobdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1
[andromeda.di.unipi.it:03921] top: openmpi-sessions-lottarin@andromeda.di.unipi.it_0
[andromeda.di.unipi.it:03921] tmp: /tmp
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync+nidmap from local proc [[18136,1],2]
[andromeda.di.unipi.it:03922] [[18136,1],3] node[0].name andromeda daemon 0
[andromeda.di.unipi.it:03921] [[18136,1],2] node[0].name andromeda daemon 0
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received message_local_procs
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received message_local_procs
Hello world from process 1 of 4
Hello world from process 3 of 4
Hello world from process 0 of 4
Hello world from process 2 of 4
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received message_local_procs
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync from local proc [[18136,1],1]
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync from local proc [[18136,1],3]
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync from local proc [[18136,1],0]
[andromeda.di.unipi.it:03920] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03922] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync from local proc [[18136,1],2]
[andromeda.di.unipi.it:03919] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03921] sess_dir_finalize: proc session dir not empty - leaving
**LAGS HERE after having received sync from all processes**
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received waitpid_fired cmd
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received iof_complete cmd
[andromeda.di.unipi.it:03918] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received iof_complete cmd
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received waitpid_fired cmd
[andromeda.di.unipi.it:03918] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received waitpid_fired cmd
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received iof_complete cmd
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received iof_complete cmd
[andromeda.di.unipi.it:03918] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received waitpid_fired cmd
[andromeda.di.unipi.it:03918] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03918] sess_dir_finalize: job session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] Releasing job data for [18136,1]
[andromeda.di.unipi.it:03918] sess_dir_finalize: job session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] Releasing job data for [18136,0]
[andromeda.di.unipi.it:03918] sess_dir_finalize: proc session dir not empty - leaving
orterun: exiting with status 0
4

0 回答 0