1

我遇到使用 OpenMP 并行化的程序的不一致行为。

当我运行它时,它会打印出它的当前阶段,因此预期的输出是:“2 3 4 5”等。前几个阶段之间的时间通常是 1 到 2 秒(在 4 个内核上并行运行时)。

但是,如果不重新编译或更改任何内容,有时当我运行软件时,它会在打印后立即挂起2(在执行第一个并行代码之前打印);

它不会变慢,它实际上会停止计算。我已经在 gdb 下运行它并确认它挂在 OpenMP 内部:

(由于超线程,线程多于4个)

[New Thread 0x7ffff6c78700 (LWP 25878)]
[New Thread 0x7ffff6477700 (LWP 25879)]
[New Thread 0x7ffff5c76700 (LWP 25880)]
[New Thread 0x7ffff5475700 (LWP 25881)]
[New Thread 0x7ffff4c74700 (LWP 25882)]
[New Thread 0x7ffff4473700 (LWP 25883)]
[New Thread 0x7ffff3c72700 (LWP 25884)]
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7641fd4 in ?? () from /usr/lib/libgomp.so.1
(gdb) up
#1  0x00007ffff7640a9e in ?? () from /usr/lib/libgomp.so.1
(gdb) 
#2  0x0000000000408ae8 in Redcraft::createStructures (this=0x7fffffffd8d0) at source/redcraft.cpp:512
512 #pragma omp parallel for private(node)

最初是pragma指定schedule(dynamic)的,但拥有或删除不会改变此挂断的一致性。最后,我尝试启用/禁用omp_set_dynamic(),但也没有效果。

有什么调试建议吗?

4

1 回答 1

1

This usually happens when there is data race.You'll have to post the code block that is being parallelized.Basically what is to be found out is how the threads are using the data.Rerunning without compiling doesn't guarantee the same thread execution sequence hence these kind of problems arise.Are you working with files?You'll have to close them before rerunning.

于 2012-10-16T12:09:49.503 回答