问题标签 [openmp]

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

0 投票
1 回答
10473 浏览

openmp - OpenMP 中的调度子句

我有一段代码(它是应用程序的一部分)正在尝试使用 OpenMP 进行优化,正在尝试各种调度策略。就我而言,我注意到该schedule(RUNTIME)子句比其他子句有优势(我没有指定 chunk_size)。我有两个问题:

  1. 当我没有指定 chunk_size 时,schedule(DYNAMIC)和之间有区别schedule(GUIDED)吗?

  2. OpenMP 如何确定存储在OMP_SCHEDULE变量中的默认实现特定调度?

我了解到,如果没有指定调度方案,则默认schedule(STATIC)使用。因此,如果我不修改OMP_SCHEDULE变量并schedule(RUNTIME)在我的程序中使用,调度方案会schedule(STATIC)一直存在,还是 OpenMP 有一些智能方法来动态设计调度策略并不时更改它?

0 投票
3 回答
1984 浏览

fortran - 带有 -fopenmp 的段错误,用于一个简单的程序

我正在刷新openmp,并陷入了这种奇怪的情况。剃掉一堆,我创建了这个显示问题的最小的琐碎案例

在没有指定标志的情况下,mac (10.6) 上的 gfortran 4.3.4 可以编译,并且程序可以正确执行。

但是,如果我使用 -fopenmp 启用 openmp,程序会因分段错误而终止。显然,没有代码被执行,因为它立即崩溃。如您所见,openmp 从未在代码中用于并行化任何内容。我尝试修改堆栈大小,都使用ulimit -fmax-stack-var-size,无论如何,一千万实数不是我定义的大数组。

我究竟做错了什么 ?

0 投票
1 回答
1235 浏览

openmp - 类成员变量和 OpenMP

我有如下场景:

该函数稍后被某个对象(如 obj.abovefunction(x))调用。由于 val1 和 val2 在 somemethod.h 中声明/初始化,所以我不能将它们用作私有 (val1, val2),并且类似私有 (this->val1, this->val2) 的东西也是不可能的。任何人都可以让我知道在这种情况下使用 OpenMP 进行并行化的最佳方法,其中变量是类的一部分并且声明不在应用 OpenMP 编译指示的代码块的直接范围内?

我在 OpenMP 论坛上问过同样的问题 - http://openmp.org/forum/viewtopic.php?f=3&t=886#p3516

谢谢,
萨扬

0 投票
2 回答
897 浏览

macos - 有没有一种方法可以让 OpenMP 在 Qt spanwed 线程上运行?

我正在尝试并行化应用程序的数字运算部分,以利用 Mac OS 10.5 上的 OpenMP 和 GCC 4.2 的四核架构。但我认为问题在于该应用程序将 Qt 用于 GUI,并且我试图在 Qt 创建的辅助线程上分叉工作线程,这会导致程序崩溃 - 但对此我不确定。

我在这里很黑暗,因为这是我第一次使用 Qt 或 OpenMP(或 C++)。非常感谢任何形式的指导。

0 投票
1 回答
252 浏览

visual-c++ - Visual Studio 中的 OMPTL?

我试图在 Visual Studio中使用OMPTL 。据我了解,我只需要设置 /openmp 选项,以便 OMPTL 使用一些 stl 函数的多线程实现。

当我不使用 /openmp 时,一切都很好,并且 OMPTL 将函数映射到它们的正常 stl 计数器部分,而无需多线程。然而,使用 /openmp,我得到一个编译器错误:

有问题的行说

有没有办法解决这个问题,或者 OMPTL 根本不能与微软的编译器一起使用?

0 投票
1 回答
1188 浏览

python - 使用 OpenMp 与 ctypes 链接

我有一个使用 openmp 的 c99 函数,它按预期工作。我还使用导致问题的 ctypes 编写了一个 python 接口。Ctypes/python 找不到 openmp 的库。这是错误消息:

我使用这些 cmds:

我已经在网上搜索并找到了一个“解决方案” ,但我不明白这是什么意思:

我想我应该将构造函数上的 restype 设置为 ctypes.c_void_p。
并且我应该将调用
函数的argtypes中的相应类型设置为ctypes.c_void_p。这会导致发生必要的转换
吗?我想确认这是
处理这种情况的正确方法。

解决方案是什么意思,或者您知道其他方式吗?

[更新]

因此,在 Iulian Şerbănoiu 的帮助下,这是正确的 cmd 行选项:

0 投票
1 回答
933 浏览

openmp - OpenMP - 在块上执行线程

我有以下一段代码,我想以某种方式使其并行。我犯了一个错误,因此并非所有线程都像我认为的那样运行循环。如果有人能帮我找出那个错误,那就太好了。

这是计算直方图的代码。

iCount 变量跟踪迭代次数,我注意到串行版本和并行版本之间存在显着差异。我猜不是所有线程都在运行,因此我从并行程序获得的直方图值远小于实际读数(密集数组存储直方图值)。

谢谢,
萨扬

0 投票
1 回答
3111 浏览

c++ - OpenMP、C++ 和迭代器

要遍历容器的元素,我通常会使用迭代器,如下所示:

现在,如果我想使用 OpenMP 并行化循环,我可能会尝试类似:

但是,当我运行所述代码时,不会对容器进行更改。但是,如果我在容器上使用典型的索引,则并行代码可以正常工作。我想知道是否可以在 OpenMP 的上下文中使用迭代器,或者我是否需要将迭代循环转换为索引循环?

0 投票
2 回答
979 浏览

language-agnostic - Multiple levels of parallelism using OpenMP - Possible? Smart? Practical?

I am currently working on a C++ sparse matrix/math/iterative solver library, for a simulation tool I manage. I would have preferred to use an existing package, however, after extensive investigation, none were found that were appropriate for our simulator (we looked at flens, it++, PetSC, eigen, and several others). The good news is my solvers and sparse matrix structures are now very efficient and robust. The bad news is, I am now looking into parallelization using OpenMP, and the learning curve is a bit steep.

The domain we solve can be broken into sub-domains, which come together in a block-diagonal format. So our storage scheme ends up looking like an array of smaller square matrices (blocks[]) each with a format appropriate to the sub-domain (e.g. Compressed Row Storage: CRS, Compressed Diagonal Storage: CDS, Dense, etc..), and a background matrix (currently using CRS) that accounts for the connectivity between sub-domains.

The "hot spot" in most (all?) iterative solvers is the Matrix Vector multiplication operation, and this is true of my library. Thus, I've been focusing my efforts on optimizing my MxV routines. For the block diagonal structure, the pseudo code for M*x=b would be as follows:

where background_matrix is the background (CRS) matrix, blocks is the array of sub-domain matrices, and .range returns the portion of the vector from a starting index to an ending index.

Obviously the loop can be (and has been) parallelized, as the operations are independent of other iterations of the loop (the ranges are non-overlapping). Since we have 10-15 blocks in a typical system, 4+ threads actually makes a significant difference.

The other place where parallelization has been seen to be a good option in is in the MxV operation for each sub-domain storage scheme (calls in lines 1 and 6 in the above code). There is plenty out there on parallelizing CRS, CDS, and dense matrix MxV operations. Typically a nice boost is seen with 2 threads, with greatly diminishing returns as more threads are added.

I am envisioning a scheme, where 4 threads would be used in the block loop for the above code, and each of those threads would use 2 threads for the sub-domain solves. However, I am not sure how, using OpenMP, one would manage the pool of threads- is it possible to limit the number of threads in an openmp for loop? Is this multi-level parallelism something that in practice makes sense? Any other thoughts on what I've proposed here would be appreciated (and thanks for reading all the way to the end!)

0 投票
3 回答
2664 浏览

fortran - OpenMP & MPI explanation

A few minutes ago I stumbled upon some text, which reminded me of something that has been wondering my mind for a while, but I had nowhere to ask.

So, in hope this may be the place, where people have hands on experience with both, I was wondering if someone could explain what is the difference between OpenMP and MPI ?

I've read the Wikipedia articles in whole, understood them in segments, but am still pondering; for a Fortran programmer who wishes one day to enter the world of paralellism (just learning the basics of OpenMP now), what is the more future-proof way to go ?

I would be grateful on all your comments