我在集群上运行 mpi 程序。当程序结束时,工作不会。所以我必须等待它超时。
我不确定如何调试它。我检查了程序是否到达了 MPI 中的 finalize 语句,并且确实如此。我正在使用 lib Elemental。
程序的最后几行
if (grid.Rank() == 0) std::cout << "Finalize" << std::endl;
std::string message = std::string("rank_") +
std::to_string(mpi::Rank(mpi::COMM_WORLD)) + "_a";
std::cout << message;
Finalize();
message = message + "b";
std::cout << message;
mpi::Finalize();
message = message + "c";
std::cout << message;
return 0;
输出将是
Finalize
rank_0_arank_0_abrank_0_abcmpiexec: killall: caught signal 15 (Terminated).
mpiexec: kill_tasks: killing all tasks.
mpiexec: wait_tasks: waiting for taub205.
mpiexec: killall: caught signal 15 (Terminated).
=>> PBS: job killed: walltime 801 exceeded limit 780
----------------------------------------
Begin Torque Epilogue (Tue Nov 4 16:15:19 2014)
Job ID: ***
Username: ***
Group: ***
Job Name: mpi_test1
Session: 11270
Limits:
ncpus=1,neednodes=1:ppn=6:m24G:taub,nodes=1:ppn=6:m24G:taub,walltime=00:13:00
Resources: cput=00:02:12,mem=429524kb,vmem=773600kb,walltime=00:13:21
Job Queue: secondary
Account: ***
Nodes: taub205
End Torque Epilogue
----------------------------------------
在https://campuscluster.illinois.edu/hardware/#taub上运行这些模块
> module list
Currently Loaded Modulefiles:
1) torque/4.2.9 5) gcc/4.7.1
2) moab/7.2.9 6) mvapich2/2.0b-gcc-4.7.1
3) env/taub 7) mvapich2/mpiexec
4) blas 8) lapack