我正在尝试 mpiexec 命令,它返回了一些 sigsev 错误代码。但是,问题不在于错误发生的原因,而在于如何显示错误。
当我们查看下面的错误代码时,
[songyi719-thinkpad-x1-extreme-2nd:172415] *** Process received signal ***
[songyi719-thinkpad-x1-extreme-2nd:172415] Signal: Segmentation fault (11)
[songyi719-thinkpad-x1-extreme-2nd:172415] Signal code: Address not mapped (1)
[songyi719-thinkpad-x1-extreme-2nd:172415] Failing at address: 0x440000e8
[songyi719-thinkpad-x1-extreme-2nd:172412] *** Process received signal ***
[songyi719-thinkpad-x1-extreme-2nd:172412] Signal: Segmentation fault (11)
[songyi719-thinkpad-x1-extreme-2nd:172412] Signal code: Address not mapped (1)
[songyi719-thinkpad-x1-extreme-2nd:172412] Failing at address: 0x440000e8
[songyi719-thinkpad-x1-extreme-2nd:172413] *** Process received signal ***
[songyi719-thinkpad-x1-extreme-2nd:172413] Signal: Segmentation fault (11)
[songyi719-thinkpad-x1-extreme-2nd:172413] Signal code: Address not mapped (1)
[songyi719-thinkpad-x1-extreme-2nd:172413] Failing at address: 0x440000e8
[songyi719-thinkpad-x1-extreme-2nd:172415] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f0c3a59e3c0]
[songyi719-thinkpad-x1-extreme-2nd:172415] [ 1] /usr/local/lib/libmpi.so.40(MPI_Comm_rank+0x3b)[0x7f0c3a78771b]
[songyi719-thinkpad-x1-extreme-2nd:172415] [ 2] ./data(+0x3a432)[0x562c1fab5432]
[songyi719-thinkpad-x1-extreme-2nd:172415] [ 3] ./data(+0x98d9)[0x562c1fa848d9]
[songyi719-thinkpad-x1-extreme-2nd:172415] [ 4] [songyi719-thinkpad-x1-extreme-2nd:172413] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fe5dd1ec3c0]
[songyi719-thinkpad-x1-extreme-2nd:172413] [ 1] /usr/local/lib/libmpi.so.40(MPI_Comm_rank+0x3b)[0x7fe5dd3d571b]
[songyi719-thinkpad-x1-extreme-2nd:172413] [songyi719-thinkpad-x1-extreme-2nd:172412] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f021418a3c0]
[songyi719-thinkpad-x1-extreme-2nd:172412] [ 1] /usr/local/lib/libmpi.so.40(MPI_Comm_rank+0x3b)[0x7f021437371b]
[songyi719-thinkpad-x1-extreme-2nd:172412] [ 2] [songyi719-thinkpad-x1-extreme-2nd:172414] *** Process received signal ***
[songyi719-thinkpad-x1-extreme-2nd:172414] Signal: Segmentation fault (11)
[songyi719-thinkpad-x1-extreme-2nd:172414] Signal code: Address not mapped (1)
[songyi719-thinkpad-x1-extreme-2nd:172414] Failing at address: 0x440000e8
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f0c3a3be0b3]
[songyi719-thinkpad-x1-extreme-2nd:172415] [ 5] ./data(+0xa33e)[0x562c1fa8533e]
[songyi719-thinkpad-x1-extreme-2nd:172415] *** End of error message ***
[songyi719-thinkpad-x1-extreme-2nd:172414] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fc68e9043c0]
[songyi719-thinkpad-x1-extreme-2nd:172414] [ 1] /usr/local/lib/libmpi.so.40(MPI_Comm_rank+0x3b)[0x7fc68eaed71b]
[songyi719-thinkpad-x1-extreme-2nd:172414] [ 2] ./data(+0x3a432)[0x55e7f5786432]
[songyi719-thinkpad-x1-extreme-2nd:172414] [ 3] ./data(+0x98d9)[0x55e7f57558d9]
[songyi719-thinkpad-x1-extreme-2nd:172414] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fc68e7240b3]
[songyi719-thinkpad-x1-extreme-2nd:172414] [ 5] ./data(+0xa33e)[0x55e7f575633e]
[songyi719-thinkpad-x1-extreme-2nd:172414] *** End of error message ***
[ 2] ./data(+0x3a432)[0x560705a04432]
[songyi719-thinkpad-x1-extreme-2nd:172413] [ 3] ./data(+0x98d9)[0x5607059d38d9]
[songyi719-thinkpad-x1-extreme-2nd:172413] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fe5dd00c0b3]
[songyi719-thinkpad-x1-extreme-2nd:172413] [ 5] ./data(+0xa33e)[0x5607059d433e]
[songyi719-thinkpad-x1-extreme-2nd:172413] *** End of error message ***
./data(+0x3a432)[0x559eacf7a432]
[songyi719-thinkpad-x1-extreme-2nd:172412] [ 3] ./data(+0x98d9)[0x559eacf498d9]
[songyi719-thinkpad-x1-extreme-2nd:172412] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f0213faa0b3]
[songyi719-thinkpad-x1-extreme-2nd:172412] [ 5] ./data(+0xa33e)[0x559eacf4a33e]
[songyi719-thinkpad-x1-extreme-2nd:172412] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 3 with PID 0 on node songyi719-thinkpad-x1-extreme-2nd exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
如您所见,相同的错误代码混合并重复了 4 次。我删除并重新安装了openmpi,但仍然错误重复4次。
这怎么可能发生?如何将此错误更改为一个不重复的简单错误代码?