I have a piece of code with two nested for loops. When the first one has few steps the second one has a lot, and the other way around. I can run both for loops with omp for directives independently and I have consistent results (and some speedup). However I'd like to:
- Run the first one in parallel if it has 16 steps or more
- Else run the second one in parallel (but not the first one even if it has 8 steps)
This is not nested parallelism because either one loop is parallel or the other. If I run them independently and run top -H to see the threads, I observe sometimes only one thread, sometimes more (in each case) so what I want to do would make sense and would actually improve performance?
So far I did something like this:
#pragma omp parallel
{
#pragma omp for schedule(static,16)
for(...){
/* some declarations */
#pragma omp for schedule(static,16) nowait
for(...){
/* ... */
}
}
}
which does not compile (work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region) and which would not behave as I described anyway. I also tried collapse but had problems with the "/* some declarations */ ", and I'd like to avoid it since it's openmp3.0 and I'm not sure the target hardware's compiler will support this.
Any ideas?