0

I have a piece of code in the following style:

for (set=0; set < n; set++)  //For1
{
   #pragma omp parallel for num_threads(x)
    for (i=0; i < m; i++)   //For2: this loop can be executed in parallel
   {
      commands...
   }

   for (j=0; j < m; j++)   //For3: this loop depends on the output of the For2 and also should be executed in a sequential way
   {
     commands...
   }

}

As you notice, I have n independent Sets (outer loop, i.e. For1). Each Set consists of a parallel loop (For2) and a sequential section (For3) which should be executed after For2.

I already used "#pragma omp parallel for num_threads(x)" for For2 to make it parallel.

Now I wanna to make the outer loop (For1) parallel as well. In other words, I wanna to run each Set in parallel.

I really appreciate if you could let me know how it is possible in openmp.

one way might be creating n threads corresponding to each Set. is it correct? But I am wondering if there is another way by entirely using openmp features?

thanks in advance.

4

3 回答 3

1

你可以简单地平行外循环

#pragma omp parallel for num_threads(x) private(i,j)
for (set=0; set < n; set++)  //For1
{
    for (i=0; i < m; i++)   //For2: this loop can be executed in parallel
   {
      commands...
   }

   for (j=0; j < m; j++)   //For3: this loop depends on the output of the For2 and also should be executed in a sequential way
   {
     commands...
   }

}
于 2013-11-06T17:28:24.510 回答
0

您可以尝试融合第一个和第二个循环(见下文)。我不知道这是否会使它变得更好,但值得一试。

    #pragma omp parallel num_threads(x) private(set, i)
    {
        #pragma omp for schedule(static)
        for (k = 0; k < n*m; k++) //fused For1 and For2
        {
            set = k/m;
            i = k%m;
            //commands...
        }
        #pragma omp for schedule(static)
        for (set = 0; set < n; set++)
        {
            for (i = 0; i < m; i++) //For3 - j is not necessary so reuse i 
            {
                //commands...
            }
        }
    }
于 2013-11-07T09:18:33.720 回答
0

根据您拥有的集合数量,简单地并行化外部循环可能是您的最佳选择。如果您的计算机上的内核数量超过了内核数量,那么它可能比并行化内部循环更快,因为在这种情况下线程创建开销要少得多。

假设您的操作受 cpu 限制,并且外部循环并行化,您将充分使用计算机上的所有内核。如果所有资源都已全部使用,则进一步尝试并行化内部循环不会更快。

如果您的集合少于可用内核,请并行化内部循环,您很可能已经消耗了所有可用的计算能力。

如果您真的想并行化两个循环,那么您应该考虑MPI并在多台计算机上进行混合并行化;外循环在多台计算机上并行,内循环在单台计算机的所有内核上并行。

于 2013-11-07T10:29:35.433 回答