1

I wrote the C application below to help me understand MPI, and why MPI_Barrier() isn't functioning in my huge C++ application. However, I was able to reproduce my problem in my huge application with a much smaller C application. Essentially, I call MPI_Barrier() inside a for loop, and MPI_Barrier() is visible to all nodes, yet after 2 iterations of the loop, the program becomes deadlocked. Any thoughts?

#include <mpi.h>
#include <stdio.h>
int main(int argc, char* argv[]) {
    MPI_Init(&argc, &argv);
    int i=0, numprocs, rank, namelen;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Get_processor_name(processor_name, &namelen);
    printf("%s: Rank %d of %d\n", processor_name, rank, numprocs);
    for(i=1; i <= 100; i++) {
            if (rank==0) printf("Before barrier (%d:%s)\n",i,processor_name);
            MPI_Barrier(MPI_COMM_WORLD);
            if (rank==0) printf("After barrier (%d:%s)\n",i,processor_name);
    }

    MPI_Finalize();
    return 0;
}

The output:

alienone: Rank 1 of 4
alienfive: Rank 3 of 4
alienfour: Rank 2 of 4
alientwo: Rank 0 of 4
Before barrier (1:alientwo)
After barrier (1:alientwo)
Before barrier (2:alientwo)
After barrier (2:alientwo)
Before barrier (3:alientwo)

I am using GCC 4.4, Open MPI 1.3 from the Ubuntu 10.10 repositories.

Also, in my huge C++ application, MPI Broadcasts don't work. Only half the nodes receive the broadcast, the others are stuck waiting for it.

Thank you in advance for any help or insights!

Update: Upgraded to Open MPI 1.4.4, compiled from source into /usr/local/.

Update: Attaching GDB to the running process shows an interesting result. It looks to me that the MPI system died at the barrier, but MPI still thinks the program is running:

Attaching GDB yields an interesting result. It seems all nodes have died at the MPI barrier, but MPI still thinks they are running:

0x00007fc235cbd1c8 in __poll (fds=0x15ee360, nfds=8, timeout=<value optimized out>) at   ../sysdeps/unix/sysv/linux/poll.c:83
83  ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
    in ../sysdeps/unix/sysv/linux/poll.c
(gdb) bt
#0  0x00007fc235cbd1c8 in __poll (fds=0x15ee360, nfds=8, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83
#1  0x00007fc236a45141 in poll_dispatch () from /usr/local/lib/libopen-pal.so.0
#2  0x00007fc236a43f89 in opal_event_base_loop () from /usr/local/lib/libopen-pal.so.0
#3  0x00007fc236a38119 in opal_progress () from /usr/local/lib/libopen-pal.so.0
#4  0x00007fc236eff525 in ompi_request_default_wait_all () from /usr/local/lib/libmpi.so.0
#5  0x00007fc23141ad76 in ompi_coll_tuned_sendrecv_actual () from /usr/local/lib/openmpi/mca_coll_tuned.so
#6  0x00007fc2314247ce in ompi_coll_tuned_barrier_intra_recursivedoubling () from /usr/local/lib/openmpi/mca_coll_tuned.so
#7  0x00007fc236f15f12 in PMPI_Barrier () from /usr/local/lib/libmpi.so.0
#8  0x0000000000400b32 in main (argc=1, argv=0x7fff5883da58) at barrier_test.c:14
(gdb) 

Update: I also have this code:

#include <mpi.h>
#include <stdio.h>
#include <math.h>
int main( int argc, char *argv[] )  {
int n = 400, myid, numprocs, i;
double PI25DT = 3.141592653589793238462643;
double mypi, pi, h, sum, x;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
printf("MPI Rank %i of %i.\n", myid, numprocs);
while (1) {
    h   = 1.0 / (double) n;
    sum = 0.0;
    for (i = myid + 1; i <= n; i += numprocs) {
        x = h * ((double)i - 0.5);
        sum += (4.0 / (1.0 + x*x));
    }
    mypi = h * sum;
    MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
    if (myid == 0)
        printf("pi is approximately %.16f, Error is %.16f\n",  pi, fabs(pi - PI25DT));
}
MPI_Finalize();
return 0;
}

And despite the infinite loop, there is only one output from the printf() in the loop:

mpirun -n 24 --machinefile /etc/machines a.out 
MPI Rank 0 of 24.
MPI Rank 3 of 24.
MPI Rank 1 of 24.
MPI Rank 4 of 24.
MPI Rank 17 of 24.
MPI Rank 15 of 24.
MPI Rank 5 of 24.
MPI Rank 7 of 24.
MPI Rank 16 of 24.
MPI Rank 2 of 24.
MPI Rank 11 of 24.
MPI Rank 9 of 24.
MPI Rank 8 of 24.
MPI Rank 20 of 24.
MPI Rank 23 of 24.
MPI Rank 19 of 24.
MPI Rank 12 of 24.
MPI Rank 13 of 24.
MPI Rank 21 of 24.
MPI Rank 6 of 24.
MPI Rank 10 of 24.
MPI Rank 18 of 24.
MPI Rank 22 of 24.
MPI Rank 14 of 24.
pi is approximately 3.1415931744231269, Error is 0.0000005208333338

Any thoughts?

4

2 回答 2

2

OpenMPI 中的 MPI_Barrier() 有时会在进程在最后一个障碍之后经过的不同时间遇到障碍时挂起,但我所看到的情况并非如此。无论如何,尝试使用 MPI_Reduce() 代替或在真正调用 MPI_Barrier() 之前使用。这并不直接等同于屏障,但任何几乎没有负载的同步调用都应该像屏障一样工作。我还没有在 LAM/MPI 或 MPICH2 甚至 WMPI 中看到 MPI_Barrier() 的这种行为,但这是 OpenMPI 的一个真正问题。

于 2012-03-25T07:42:57.123 回答
1

你有什么互连?它是像 InfiniBand 或 Myrinet 这样的专用设备,还是您只是在以太网上使用普通 TCP?如果使用 TCP 传输运行,您是否配置了多个网络接口?

此外,Open MPI 是模块化的——有许多模块提供实现各种集合操作的算法。您可以尝试使用 MCA 参数来调整它们,例如,您可以通过传递mpirun类似--mca btl_base_verbose 30. 寻找类似的东西:

[node1:19454] btl: tcp: attempting to connect() to address 192.168.2.2 on port 260
[node2:29800] btl: tcp: attempting to connect() to address 192.168.2.1 on port 260
[node1:19454] btl: tcp: attempting to connect() to address 192.168.109.1 on port 260
[node1][[54886,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.109.1 failed: Connection timed out (110)

在这种情况下,一些(或所有)节点具有多个已配置的网络接口,但并非所有节点都可以通过所有接口访问。这可能会发生,例如,如果节点运行具有默认启用的 Xen 支持(RHEL?)的最新 Linux 发行版,或者在它们上安装了其他虚拟化软件,这些软件会启动虚拟网络接口。

默认情况下,Open MPI 是惰性的,即按需打开连接。如果选择了正确的接口,第一次发送/接收通信可能会成功,但后续操作可能会选择一条备用路径以最大化带宽。如果通过第二个接口无法访问另一个节点,则可能会发生超时并且通信将失败,因为 Open MPI 将认为另一个节点已关闭或有问题。

btl解决方案是使用 TCP模块的 MCA 参数来隔离非连接网络或网络接口:

  • 强制 Open MPI 仅使用特定 IP 网络进行通信:--mca btl_tcp_if_include 192.168.2.0/24
  • 强制 Open MPI 仅使用一些已知可提供完整网络连接的网络接口:--mca btl_tcp_if_include eth0,eth1
  • 强制 Open MPI使用已知为私有/虚拟或属于不连接节点的其他网络的网络接口(如果您选择这样做,则必须排除环回lo):--mca btl_tcp_if_exclude lo,virt0

有关更多详细信息,请参阅Open MPI 运行时 TCP 调整常见问题解答

于 2012-05-04T11:08:44.837 回答