parallel-processing - Openmp scheduling

Question

I have a piece of code with two nested for loops. When the first one has few steps the second one has a lot, and the other way around. I can run both for loops with omp for directives independently and I have consistent results (and some speedup). However I'd like to:

Run the first one in parallel if it has 16 steps or more
Else run the second one in parallel (but not the first one even if it has 8 steps)

This is not nested parallelism because either one loop is parallel or the other. If I run them independently and run top -H to see the threads, I observe sometimes only one thread, sometimes more (in each case) so what I want to do would make sense and would actually improve performance?

So far I did something like this:

#pragma omp parallel
{
    #pragma omp for schedule(static,16)
    for(...){
        /* some declarations */
        #pragma omp for schedule(static,16) nowait
        for(...){
            /* ... */
        }
    }
}

which does not compile (work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region) and which would not behave as I described anyway. I also tried collapse but had problems with the "/* some declarations */ ", and I'd like to avoid it since it's openmp3.0 and I'm not sure the target hardware's compiler will support this.

Any ideas?

score 1 · Accepted Answer

您不能嵌套绑定到同一并行区域的工作共享结构，但您可以使用嵌套并行性并通过该if(condition)子句选择性地停用这些区域。如果在运行时condition计算结果为true，则该区域处于活动状态，否则它会串行执行。它看起来像这样：

/* Make sure nested parallelism is enabled */
omp_set_nested(1);

#pragma omp parallel for schedule(static) if(outer_steps>=16)
for(...){
    /* some declarations */
    #pragma omp parallel for if(outer_steps<16)
    for(...){
        /* ... */
    }
}

这里的缺点是如果内部区域在运行时不活动，则会引入少量开销。如果您希望提高效率并准备为此牺牲可维护性，那么您可以编写嵌套循环的两种不同实现，并根据outer_steps.

parallel-processing - Openmp scheduling

1 回答 1

Related

Reference