c - 屏障调用卡在 Open MPI（C 程序）中

Question

我正在通过使用 Open MPI 消息通信来练习通过屏障进行同步。我创建了一个名为容器的结构数组。每个容器都与右边的邻居相连，两端的两个元素也相连，形成一个圆圈。

在 main() 测试客户端中，我使用多个进程 (mpiexec -n 5 ./a.out) 运行 MPI，它们应该通过调用 barrier() 函数来同步，但是，我的代码最后卡住了过程。我正在寻求调试帮助。请在下面查看我的代码：

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <mpi.h>

typedef struct container {
    int labels;                  
    struct container *linked_to_container;    
    int sense;
} container;

container *allcontainers;   /* an array for all containers */
int size_containers_array;

int get_next_container_id(int current_container_index, int max_index)
{
    if (max_index - current_container_index >= 1)
    {
        return current_container_index + 1;
    }
    else 
        return 0;        /* elements at two ends are linked */
}

container *get_container(int index)
{
    return &allcontainers[index];
}


void container_init(int num_containers)
{
    allcontainers = (container *) malloc(num_containers * sizeof(container));  /* is this right to malloc memory on the array of container when the struct size is still unknown?*/
    size_containers_array = num_containers;

    int i;
    for (i = 0; i < num_containers; i++)
    {
        container *current_container = get_container(i);
        current_container->labels = 0;
        int next_container_id = get_next_container_id(i, num_containers - 1);     /* max index in all_containers[] is num_containers-1 */
        current_container->linked_to_container = get_container(next_container_id);
        current_container->sense = 0;   
    }
}

void container_barrier()
{
    int current_container_id, my_sense = 1;
    int tag = current_container_id;
    MPI_Request request[size_containers_array];
    MPI_Status status[size_containers_array];

    MPI_Comm_rank(MPI_COMM_WORLD, &current_container_id);
    container *current_container = get_container(current_container_id);

    int next_container_id = get_next_container_id(current_container_id, size_containers_array - 1);

    /* send asynchronous message to the next container, wait, then do blocking receive */
    MPI_Isend(&my_sense, 1, MPI_INT, next_container_id, tag, MPI_COMM_WORLD, &request[current_container_id]);
    MPI_Wait(&request[current_container_id], &status[current_container_id]);
    MPI_Recv(&my_sense, 1, MPI_INT, next_container_id, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

}

void free_containers()
{
    free(allcontainers);
}

int main(int argc, char **argv)
{
    int my_id, num_processes;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &num_processes);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_id);

    container_init(num_processes);

    printf("Hello world from thread %d of %d \n", my_id, num_processes);
    container_barrier();
    printf("passed barrier \n");



    MPI_Finalize();
    free_containers();

    return 0;
}

score 1 · Accepted Answer

问题是一系列调用：

MPI_Isend()
MPI_Wait()
MPI_Recv()

这是一个常见的混淆来源。当您在 MPI 中使用“非阻塞”调用时，您实质上是在告诉 MPI 库您想要对某些数据 ( ) 执行某些操作（发送my_sense）。MPI 为您返回一个MPI_Request对象，并保证调用将在完成函数完成时完成MPI_Request。

您在这里遇到的问题是您在呼叫任何级别之前MPI_Isend立即呼叫并立即呼叫。这意味着所有这些发送调用都会排队，但实际上从来没有地方可去，因为您从未通过调用告诉 MPI 将数据放在哪里（这告诉 MPI 您要将数据放入）。MPI_WaitMPI_RecvMPI_Recvmy_sense

这部分时间起作用的原因是 MPI 预计事情可能并不总是完美同步。如果您减少消息（您这样做），MPI 会保留一些缓冲区空间并让您的MPI_Send操作完成，并且数据会在该临时空间中存储一段时间，直到您MPI_Recv稍后调用告诉 MPI 将数据移动到哪里。最终，这将不再起作用。缓冲区将满，您需要真正开始接收您的消息。对您而言，这意味着您需要切换操作的顺序。与其进行非阻塞发送，不如先进行非阻塞接收，然后进行阻塞发送，然后等待接收完成：

MPI_Irecv()
MPI_Send()
MPI_Wait()

另一种选择是将两个函数都转换为非阻塞函数并MPI_Waitall改用：

MPI_Isend()
MPI_Irecv()
MPI_Waitall()

最后一个选项通常是最好的。唯一需要注意的是不要覆盖自己的数据。现在，您对发送和接收操作使用相同的缓冲区。如果这两者同时发生，则无法保证排序。通常这并没有什么不同。无论您是先发送消息还是接收它并不重要。但是，在这种情况下确实如此。如果您先接收数据，您最终将再次发送相同的数据，而不是发送接收操作之前的数据。您可以通过使用临时缓冲区来暂存数据并在安全时将其移动到正确的位置来解决此问题。

c - 屏障调用卡在 Open MPI（C 程序）中

1 回答 1

Related

Reference