1

据我了解,这用于将所有流程置于同一级别。我需要找到 openMPI 程序的总体处理时间(所有进程完成的时间),所以我认为将 aMPI_Barrier()放在最后,然后MPI_Wtime()-t在最后打印将打印所有进程完成的时间。

        MPI_stuff;//whatever i want my program to do
        MPI_Barrier(MPI_COMM_WORLD);
        cout << "final time ::: :: " << MPI_Wtime()-t << rank  << endl;
        MPI_Finalize();

但是我使用 MPI_Barrier() 的时间与单个进程的情况有很大不同MPI_Wtime()-t

4

1 回答 1

3

It is very easy for MPI processes to become desynchronised in time, especially if the algorithms involved in MPI_stuff are not globally synchronous. It is very typical with most cluster MPI implementations that processes are quite desynchronised from the very beginning due to the different start-up times and the fact that MPI_Init() can take varying amount of time. Another source of desynchronisation is the OS noise, i.e. other processes occasionally sharing CPU time with some of the processes in the MPI job.

That's why the correct way to measure the execution time of a parallel algorithm is to put a barrier before and after the measured block:

MPI_Barrier(MPI_COMM_WORLD); // Bring all processes in sync
t = -MPI_Wtime();
MPI_stuff;
MPI_Barrier(MPI_COMM_WORLD); // Wait for all processes to finish processing
t += MPI_Wtime();

If the first MPI_Barrier is missing and MPI_stuff does not synchronise the different processes, it could happen that some of them arrive at the next barrier very early while others arrive very late, and then the early ones have to wait for the late ones.

Also note that MPI_Barrier gives no guarantee that all processes exit the barrier at the same time. It only guarantees that there is a point in time when the execution flow in all processes is inside the MPI_Barrier call. Everything else is implementation dependent. On some platforms, notably the IBM Blue Gene, global barriers are implemented using a special interrupt network and there MPI_Barrier achieves almost cycle-perfect synchronisation. On clusters barriers are implemented with message passing and therefore barrier exit times might vary a lot.

于 2013-07-11T14:49:43.183 回答