0

我正在编写一个 MPI 程序(Visual Studio 2k8 + MSMPI),它使用 Boost::thread 为每个 MPI 进程生成两个线程,并且遇到了我无法追踪的问题。

当我使用: 运行程序时mpiexec -n 2 program.exe,其中一个进程突然终止:

job aborted:
[ranks] message

[0] terminated

[1] process exited without calling finalize

---- error analysis -----

[1] on winblows
program.exe ended prematurely and may have crashed. exit code 0xc0000005


---- error analysis -----

我不知道为什么第一个进程突然终止,并且无法弄清楚如何追查原因。即使我在所有操作结束时将零级进程放入无限循环中,也会发生这种情况......它只是突然死亡。我的主要功能如下所示:

int _tmain(int argc, _TCHAR* argv[])
{
    /* Initialize the MPI execution environment. */
    MPI_Init(0, NULL);

    /* Create the worker threads. */
    boost::thread masterThread(&Master);
    boost::thread slaveThread(&Slave);

    /* Wait for the local test thread to end. */
    masterThread.join();
    slaveThread.join();

    /* Shutdown. */
    MPI_Finalize();
    return 0;
}

Where the master and slave functions do some arbitrary work before ending. I can confirm that the master thread, at the very least, is reaching the end of it's operations. The slave thread is always the one that isn't done before the execution gets aborted. Using print statements, it seems like the slave thread isn't actually hitting any errors... it's happily moving along and just get's taken out in the crash.

So, does anyone have any ideas for:
a) What could be causing this?
b) How should I go about debugging it?

Thanks so much!

Edit:

Posting minimal versions of the Master/Slave functions. Note that the goal of this program is purely for demonstration purposes... so it isn't doing anything useful. Essentially, the master threads send a dummy payload to the slave thread of the other MPI process.

void Master()
{   
    int  myRank;
    int  numProcs;
    MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
    MPI_Comm_rank(MPI_COMM_WORLD, &myRank);

    /* Create a message with numbers 0 through 39 as the payload, addressed 
     * to this thread. */
    int *payload= new int[40];
    for(int n = 0; n < 40; n++) {
        payload[n] = n;
    }

    if(myRank == 0) {
        MPI_Send(payload, 40, MPI_INT, 1, MPI_ANY_TAG, MPI_COMM_WORLD);
    } else {
        MPI_Send(payload, 40, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD);
    }

    /* Free memory. */
    delete(payload);
}

void Slave()
{
    MPI_Status status;
    int *payload= new int[40];
    MPI_Recv(payload, 40, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);

    /* Free memory. */
    delete(payload);
}
4

1 回答 1

1

you have to use thread safe version of mpi runtime. read up on MPI_Init_thread.

于 2010-03-09T22:48:02.030 回答