2

我在超级计算集群(PSB 环境)中使用 mvapich。我需要启用 MPI_THREAD_MULTIPLE 支持才能运行我的程序。但是我的程序的输出表明 MPI_Init_thread 未能启用 MPI_THREAD_MULTIPLE。

PBS 脚本是:

#!/bin/sh
APP_NAME=score
NP=2
NP_PER_NODE=1
RUN="RAW"

rm -f hosts.list
for i in `echo $LSB_HOSTS`; do
echo $i >>hosts.list
done

/home/compiler/mpi/mvapich/1.0/icc.ifort-9.1/bin/mpirun_rsh -np 2 -hostfile ./hosts.list MV2_ENABLE_AFFINITY=0 /home/users/simmykq/users/zhengyuan/mpi_parallel_framework/master_slave/exe_framework

(最后一行是 exe 命令。)

我的程序就像

int main(int argc,char *argv[])
{
        int p,id;
        int t;
        int provided;
        pthread_t tid[4];

        MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provided);
        if(provided!=MPI_THREAD_MULTIPLE)
        {
                printf("MPI cannot support mutiple\n");
                MPI_Abort(MPI_COMM_WORLD,0);
        }
   //...........
}

输出就像

Sender: LSF System <lsfadmin@a328>
Subject: Job 2958650: <t> Exited

Job <t> was submitted from host <inode01> by user <simmykq> in cluster <MagicCube_SC1>.
Job was executed on host(s) <1*a328>, in queue <score>, as user <simmykq> in cluster <MagicCube_SC1>.
                            <1*a215>
</home/users/simmykq> was used as the home directory.
</home/users/simmykq/users/zhengyuan/mpi_parallel_framework/master_slave> was used as the working directory.
Started at Sun Mar 16 12:51:09 2014
Results reported at Sun Mar 16 12:51:36 2014

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
./test2.lsf
------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time   :      0.57 sec.

The output (if any) follows:

Exit code -3 signaled from a328
MPI cannot support mutiple
MPI cannot support mutiple
Killing remote processes...[0] [MPI Abort by user] Aborting Program!
[1] [MPI Abort by user] Aborting Program!
Abort signaled by rank 0: MPI Abort by user Aborting program !
Abort signaled by rank 1: MPI Abort by user Aborting program !
MPI process terminated unexpectedly
MPI process terminated unexpectedly
DONE
Signal 15 received.
Signal 15 received.

感谢您的任何提示。:)

4

1 回答 1

1

在您使用 调用MPI_Init_thread的代码中MPI_THREAD_MULTIPLE,但调用返回的内容不等于 MPI_THREAD_MULTIPLE:

 MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provided);
 if(provided!=MPI_THREAD_MULTIPLE)

这意味着您安装了不支持 MPI_THREAD_MULTIPLE 的 MPI 库。您需要重建或重新安装您的 MPI 库,其版本配置为支持 MPI_THREAD_MULTIPLE。例如,在 MPICH2 中,只有 2 个数据传输层支持 MPI_THREAD_MULTIPLE:nemesis 和 sock。不了解 MVAPICH,但检查其配置参数和配置输出。

于 2014-03-17T04:52:59.620 回答