-1

我正在试验 MPI,当我在命令行上通过 mpirun 运行它时,我一直收到这个错误。

----------------------------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
----------------------------------------------------------------------------------------------

我不知道为什么,因为其他 mpi 程序运行得很好。

这是我的代码。

#include <stdio.h>
#include <mpi.h>

int func(int num){
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    if (num == 0){
        num = 5;
        MPI_Bcast(&num, 1, MPI_INT, rank, MPI_COMM_WORLD);
    }
    return num;
}

int main(int argc, char **argv){
    int rank, size;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    printf("On processor %d, func returns %d\n", rank, func(rank));
    MPI_Finalize();
    return 0;
}

该程序仍然给我同样的错误。if 语句中的 MPI_Bcast 是否无效?如果您在不是 root 时尝试广播,它仍然有效吗?

4

2 回答 2

3

The signature of MPI_Bcast as I see it in any reference document is int MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm). However, you are passing only four arguments, and looks like you forgot either about the first or second argument.

What is num in your case, and what is your buffer? The answer to this will likely resolve your question, but I am also not sure why your code even compiles. If num is what you want to broadcast, try if MPI_Bcast(& num, 1, MPI_INT, rank, MPI_COMM_WORLD) works for you.

There is another, very serious independent problem. You have some int rank; on your stack and pass this to MPI_Bcast before you ever initialize it. Who is sending? If root is, you could just as well pass 0, or initialize properly by int rank = 0;.

Undetermined values for rank are almost certainly the reason for your job to abort because instances will be randomly sending or receiving.

于 2013-01-24T22:42:01.053 回答
3

这段代码没有意义。MPI_Bcast集体通信调用,这意味着,为了成功完成操作,提供的通信器(MPI_COMM_WORLD在您的情况下)中的所有等级都必须调用它。MPI_Bcast也是一个有操作,即有一个指定的信息源,即具有指定等级的进程。所以除了所有等级都必须调用的要求之外MPI_Bcast,它们还必须为root提供相同的等级。

您的程序MPI_Bcast仅在num参数funcis时调用0,这仅发生在 rank 中0。在所有其他行列func中不调用MPI_Bcast,他们只是完成库并退出。这会导致MPI_Bcast最终失败,因为它会尝试将消息发送到不再可用的进程,最终导致错误(“最终”,因为该标准允许早期本地完成,并且在某些情况下,尤其是像您的情况那样的小消息,发送被缓冲)。默认情况下,MPI 通过中止作业而不是返回错误代码来处理错误。

没有什么可以阻止您从条件中调用任何 MPI 集合函数,但是您必须小心并确保所有等级最终都会进行集合调用,无论他们采用什么代码路径来执行此操作。

你的正确版本func是:

int func(int num) {
    if (num == 0) {
        num = 5;
    }
    MPI_Bcast(&num, 1, MPI_INT, 0, MPI_COMM_WORLD);
    return num;
}

使用“条件内的调用”可能是:

int func(int num) {
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    if (rank == 0) {
        num = 5;
        MPI_Bcast(&num, 1, MPI_INT, 0, MPI_COMM_WORLD);
    }
    else
        MPI_Bcast(&num, 1, MPI_INT, 0, MPI_COMM_WORLD);
    return num;
}

(但这完全没有必要)

于 2013-01-25T13:19:24.153 回答