c - MPI_Recv 和超时

Question

我有个问题。假设我有 np 进程。对于每个进程，我根据输入文件计算我需要向每个其他进程（从 0 到...）发送多少消息，并且我想向他们发送这个数字。问题是我只能从通过直接连接节点创建的拓扑发送。所以基本上我希望每个进程向所有其他进程发送一个 int，我有以下算法（将使用伪代码）：

for(i=1,np){
    if(i!=rankID){
        MPI_Send(&nr,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM_WORLD);
        MPI_SEND(&i,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM,WORLD); //i send the destination along with the int 
    }
}
while(1){
    MPI_Recv(&recvInt,1,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
    MPI_Recv(&destination,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
    if(destination == rankID){
        ireceive+=recvInt;
        receivedFrom++;
        //normally i would break if i received all np-1 messages but what if someone sends a message through me for another process ?
    }
    else{
        MPI_Send(&recvInt,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
        MPI_Send(&destination,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
    }

}

现在再解释一下。在这个小算法结束时，我希望我的每个进程都知道他们将在下一步中收到多少消息。

要将这些消息从每个节点发送到每个节点，我使用我创建的先前路由表。基本上每个节点都有一个包含所有节点的矩阵，并且 topology[node][1] = next hop（这就是我在上面输入 nexthop 的原因代码）。

每个节点都知道有 np 个进程，因此每个节点都必须接收 np-1 消息（他是目的地）。

我遇到的问题是，在收到 np-1 消息后，我无法中断，因为我可能是其他进程的 next_hop 并且不会发送消息。所以我想做这样的事情，使用 MPI_TEST 或其他指令来查看我的 Recv 是否真的在接收某些东西，或者它是否只是坐在那里，因为如果程序阻塞 1-2 秒，很明显它不会接收不再（因为我没有大的拓扑结构，最多 20-30 个进程）。

问题是我从来没有使用过 MPI_Test 或其他语法，我不知道该怎么做。有人可以帮我为 Recv 创建超时或者是否有其他解决方案？谢谢，很抱歉文字太长了

score 0 · Accepted Answer

可能不是最有效的代码，但它应该可以工作（我没有机会测试它）

MPI_Request request;
MPI_Status status;
for(i=1,np){
    if(i!=rankID){
        MPI_ISend(&nr,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM_WORLD);
        MPI_ISend(&i,1,MPI_INT,topology[i][nexthop],DATA,MPI_COMM,WORLD); //i send the destination along with the int 
    }
}
while(1){
    bool over = false;
    if(over == true)
        break;
    if(recievedFrom < np){
        MPI_Recv(&recvInt,1,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
        MPI_Recv(&destination,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
        if(destination == rankID){
            ireceive+=recvInt;
            receivedFrom++;
            //normally i would break if i received all np-1 messages but what if someone sends a message through me for another process ?
        }
        else{
            MPI_Send(&recvInt,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
            MPI_Send(&destination,1,MPI_INT,topology[destination][nexthop],DATA,MPI_COMM_WORLD);
        }
    }
    else {
        MPI_Irecv(&recvInt,1,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD, request); // non blocking recieve call after you finished receiving everything addressed to you
        time_t now = time(NULL);
        while(time(NULL) < now + time_you_set_until_timeout){
            over = true;
            int flag = 0;
            MPI_Test(req, flag, status);
            if(flag){
                over = false;
                break; //exit timeout loop if something was received
            }
        }
    }
    if(!over){
            MPI_Recv(&destination,MPI_INT,MPI_ANY_SOURCE,DATA,MPI_COMM,WORLD);
            //route the message and continue
    }
}

无论如何，由于您不知道在消息通过您的拓扑之前可以经过多少时间，您应该小心选择超时时间。您可以尝试实现某种其他类型的信令机制，例如广播一条消息，告诉节点接收到所有发给它的消息。当然，它会增加发送的消息数量，但它会确保每个人都得到一切。您也可以尝试打包或序列化要发送的数据，这样您就只有一个 Send/Recv 调用，这将使您的代码更易于使用（在我看来）。

c - MPI_Recv 和超时

1 回答 1

Related

Reference