c++ - omp 有序子句如何工作？

Question

vector<int> v;

#pragma omp parallel for ordered schedule(dynamic, anyChunkSizeGreaterThan1)
    for (int i = 0; i < n; ++i){
            ...
            ...
            ...
#pragma omp ordered
            v.push_back(i);
    }

这填充v了一个n大小有序的列表。

当到达omp ordered块时，所有线程都需要等待可能的最低迭代线程完成，但是如果没有一个线程被指定为特定的迭代呢？或者 OpenMP 运行时库是否始终确保最低迭代由某个线程处理？

另外，为什么建议将ordered子句与 ? 一起使用dynamic schedule？会static schedule影响性能吗？

score 54 · Accepted Answer

The ordered clause works like this: different threads execute concurrently until they encounter the ordered region, which is then executed sequentially in the same order as it would get executed in a serial loop. This still allows for some degree of concurrency, especially if the code section outside the ordered region has substantial run time.

There is no particular reason to use dynamic schedule instead of static schedule with small chunk size. It all depends on the structure of the code. Since ordered introduces dependency between the threads, if used with schedule(static) with default chunk size, the second thread would have to wait for the first one to finish all iterations, then the third thread would have to wait for the second one to finish its iterations (and hence for the first one too), and so on. One could easily visualise it with 3 threads and 9 iterations (3 per thread):

tid  List of     Timeline
     iterations
0    0,1,2       ==o==o==o
1    3,4,5       ==.......o==o==o
2    6,7,8       ==..............o==o==o

= shows that the thread is executing code in parallel. o is when the thread is executing the ordered region. . is the thread being idle, waiting for its turn to execute the ordered region. With schedule(static,1) the following would happen:

tid  List of     Timeline
     iterations
0    0,3,6       ==o==o==o
1    1,4,7       ==.o==o==o
2    2,5,8       ==..o==o==o

I believe the difference in both cases is more than obvious. With schedule(dynamic) the pictures above would become more or less random as the list of iterations assigned to each thread is non-deterministic. It would also add an additional overhead. It is only useful if the amount of computation is different for each iteration and it takes much more time to do the computation than is the added overhead of using dynamic scheduling.

Don't worry about the lowest numbered iteration. It is usually handled to the first thread in the team to become ready to execute code.

c++ - omp 有序子句如何工作？

1 回答 1

Related

Reference