1

我有一个使用 openmp 的 CPP 代码。它链接到一个fortran90 代码。如果用一个线程运行,一切都很好。如果使用不同于 1 的任意数量的线程运行,则在退出 cpp 部分时会出现分段错误。代码的结果是准确的,没有任何错误。它运行平稳,直到该退出。与openmp相关的部分代码为:

#pragma omp parallel for shared(even_phi,odd_phi,odd_divisor,odd_start_index,odd_iter_index) private(ii,jj,kk,cc,io,pp,f1,f2,f3,f4,f5,f6,ff,tmp_phi) schedule(static)
            for (kk=1; kk<nz-1; kk++)
            {
                cc = (kk-1)*(ny-2);

                for (jj=1; jj<ny-1; jj++)
                {
                    io = odd_start_index[cc];
                    pp = odd_iter_index[cc++];

                    for (ii=io; ii<maxElem; ii++)
                    {
                        f1 = even_phi[pp-odown];
                        f2 = even_phi[pp-oright];
                        f3 = even_phi[pp];
                        tmp_phi = odd_phi[pp];
                        f4 = even_phi[pp+1];
                        f5 = even_phi[pp+oleft];
                        f6 = even_phi[pp+oup];

                        ff = f1+f2+f3+f4+f5+f6;

                        odd_phi[pp] = odd_divisor[pp]*ff + c2*tmp_phi;

                        pp++;
                    }
                }
            }

这是一个标准的数值求解器代码。在没有 openmp 和 OMP_NUM_THREADS=1 的情况下也可以完美运行。如果使用更多线程执行,在几乎完全正常执行之后,Valgrinds 说:

==23723== Thread 20:
==23723== Jump to the invalid address stated on the next line
==23723==    at 0x2A6EBBB8: ???
==23723==    by 0x2A6EA515: ???
==23723==  Address 0x2a6ebbb8 is not stack'd, malloc'd or (recently) free'd
==23723== 
==23723== 
==23723== Process terminating with default action of signal 11 (SIGSEGV)
==23723==  Access not within mapped region at address 0x2A6EBBB8
==23723==    at 0x2A6EBBB8: ???
==23723==    by 0x2A6EA515: ???
==23723==  If you believe this happened as a result of a stack
==23723==  overflow in your program's main thread (unlikely but
==23723==  possible), you can try to increase the size of the
==23723==  main thread stack using the --main-stacksize= flag.
==23723==  The main thread stack size used in this run was 1048576.
==23723== 
==23723== HEAP SUMMARY:
==23723==     in use at exit: 632,995,339 bytes in 101 blocks
==23723==   total heap usage: 10,071 allocs, 9,970 frees, 1,257,933,743 bytes allocated
==23723== 
==23723== Thread 1:
==23723== 6,992 bytes in 23 blocks are possibly lost in loss record 47 of 74
==23723==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==23723==    by 0x35A0E11812: _dl_allocate_tls (dl-tls.c:300)
==23723==    by 0x35A1E07068: pthread_create@@GLIBC_2.2.5 (allocatestack.c:571)
==23723==    by 0x2A6EA981: ???
==23723==    by 0x2A4C666E: ???
==23723==    by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723==    by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723==    by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723== 
==23723== 30,276 bytes in 1 blocks are definitely lost in loss record 50 of 74
==23723==    at 0x4A0674C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==23723==    by 0x2A4C6394: ???
==23723==    by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723==    by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723==    by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723== 
==23723== 30,276 bytes in 1 blocks are definitely lost in loss record 51 of 74
==23723==    at 0x4A0674C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==23723==    by 0x2A4C63BF: ???
==23723==    by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723==    by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723==    by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723== 
==23723== 30,276 bytes in 1 blocks are definitely lost in loss record 52 of 74
==23723==    at 0x4A0674C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==23723==    by 0x2A4C63EA: ???
==23723==    by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723==    by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723==    by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723== 
==23723== 30,276 bytes in 1 blocks are definitely lost in loss record 53 of 74
==23723==    at 0x4A0674C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==23723==    by 0x2A4C6415: ???
==23723==    by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723==    by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723==    by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723== 
==23723== 39,232 bytes in 1 blocks are definitely lost in loss record 57 of 74
==23723==    at 0x4A0674C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==23723==    by 0x2A4C6369: ???
==23723==    by 0x4C8DB7: solvermodule (in /home/tom/bin/solver)
==23723==    by 0x4C6794: MAIN__ (qdiff4v.f90:749)
==23723==    by 0x4C8DF9: main (in /home/tom/bin/solver)
==23723== 
==23723== LEAK SUMMARY:
==23723==    definitely lost: 160,336 bytes in 5 blocks
==23723==    indirectly lost: 0 bytes in 0 blocks
==23723==      possibly lost: 6,992 bytes in 23 blocks
==23723==    still reachable: 632,828,011 bytes in 73 blocks
==23723==         suppressed: 0 bytes in 0 blocks
==23723== Reachable blocks (those to which a pointer was found) are not shown.
==23723== To see them, rerun with: --leak-check=full --show-reachable=yes
==23723== 
==23723== For counts of detected and suppressed errors, rerun with: -v
==23723== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 6 from 6)

gdb 说:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff5a04700 (LWP 23837)]
0x00007ffff7024bc2 in ?? ()
Missing separate debuginfos, use: debuginfo-install libgcc-4.4.6-4.el6.x86_64         libgfortran-4.4.6-4.el6.x86_64 libgomp-4.4.6-4.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64

这显然没有帮助。我一直在玩 GOMP_STACKSIZE 和线程数,认为我可能有堆栈大小问题,但无济于事。

我错过了一些东西。也许有些愚蠢。而且找不到。

4

1 回答 1

0

这是 GCC 中的一个错误。我在 GCC 上发现了一个关于使用 openmp 和 iso_c_binding 模块相关问题的错误报告。之后,我使用英特尔编译器编译并执行了代码,没有任何问题。

我的代码很长,不知道如何隔离有问题的部分以重现错误并进行报告。会尽我所能做到这一点。

我正在使用 gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4),CentOS 版本 6.3 (Final)。

我会将此标记为答案,如果以后我发现任何更有用的东西,我会在这里发布,因为它可能对其他人有用。

于 2012-12-11T09:49:43.087 回答