multithreading - 线程化向量统计库-数学内核库时数据损坏

Question

我刚刚并行化了一个模拟个人行为的 fortran 例程，并且在使用 Vector Statistical Library（来自 Math Kernel Library 的库）生成随机数时遇到了一些问题。该程序的结构如下：

program example
...
!$omp parallel do num_threads(proc) default(none) private(...) shared(...)
do i=1,n
call firstroutine(...)
enddo
!$omp end parallel do
...
end program example

subroutine firstroutine
...
call secondroutine(...)
...
end subroutine

subroutine secondroutine
...
VSL calls
...
end subroutine

我使用 Intel Fortran 编译器进行编译，生成文件如下所示：

f90comp = ifort
libdir = /home
mklpath = /opt/intel/mkl/10.0.5.025/lib/32/
mklinclude = /opt/intel/mkl/10.0.5.025/include/
exec: Example.o Firstroutine.o Secondroutine.o
      $(f90comp) -O3 -fpscomp logicals -openmp -o  aaa -L$(mklpath) -I$(mklinclude) Example.o -lmkl_ia32 -lguide -lpthread
Example.o: $(libdir)Example.f90
       $(f90comp) -O3 -fpscomp logicals -openmp -c $(libdir)Example.f90
Firstroutine.o: $(libdir)Firstroutine.f90
       $(f90comp) -O3 -fpscomp logicals -openmp -c $(libdir)Firstroutine.f90
Secondroutine.o: $(libdir)Secondroutine.f90
       $(f90comp) -O3 -fpscomp logicals -openmp -c -L$(mklpath) -I$(mklinclude) $(libdir)Secondroutine.f90  -lmkl_ia32 -lguide -lpthread

在编译时一切正常。当我运行我的程序生成变量时，一切似乎都正常。但是，有时（例如每 200-500 次迭代一次），它会为几次迭代生成疯狂的数字，然后以正常方式再次运行。我还没有发现这种腐败何时发生的任何模式。

知道为什么会这样吗？

score 0 · Accepted Answer

The random number code is either using a global variable internally or all threads use the same generator. Eventually, two threads will try to update the same piece of memory at the same time and the result will be non-predictable.

So you must allocate one random number generator per thread.

Solution: Protect the call to the random routine with a semaphore/lock.

score 0 · Accepted Answer

I got the solution! I was modifying the pseudo-random numbers generated by some values taken from a file. From time to time, more than one thread tried to read the same file and generated the corruption. To solve this, I added a omp critical section and it worked.

multithreading - 线程化向量统计库-数学内​​核库时数据损坏

2 回答 2

Related

Reference

multithreading - 线程化向量统计库-数学内核库时数据损坏