c - OpenMP 自定义缩减变量

Question

我被指派在不使用归约子句的情况下实现归约变量的想法。我设置了这个基本代码来测试它。

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
for (int i = 0; i < n; ++i)
{
    val += 1;
}
sum += val;

所以最后sum == n。

每个线程都应该将 val 设置为私有变量，然后对 sum 的加法应该是线程收敛的关键部分，例如

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel for private(i, val) shared(n) num_threads(nthreads)
for (int i = 0; i < n; ++i)
{
    val += 1;
}
#pragma omp critical
{
    sum += val;
}

我不知道如何为关键部分维护 val 的私有实例。我试过用一个更大的 pragma 来包围整个事情，例如

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val) shared(sum)
{
#pragma omp parallel for private(i) shared(n) num_threads(nthreads)
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
#pragma omp critical
    {
        sum += val;
    }
}

但我没有得到正确的答案。我应该如何设置编译指示和子句来做到这一点？

score 6 · Accepted Answer

你的程序有很多缺陷。让我们看看每个程序（缺陷写成注释）。

程序一

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel for private(i, val) shared(n) num_threads(nthreads)
for (int i = 0; i < n; ++i)
{
    val += 1;
}
// At end of this, all the openmp threads die. 
// The reason is the "pragma omp parallel" creates threads, 
// and the scope of those threads were till the end of that for loop. So, the thread dies
// So, there is only one thread (i.e. the main thread) that will enter the critical section
#pragma omp critical
{
    sum += val;
}

方案二

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val) shared(sum)
 // pragma omp parallel creates the threads
{
#pragma omp parallel for private(i) shared(n) num_threads(nthreads)
  // There is no need to create another set of threads
  // Note that "pragma omp parallel" always creates threads.
  // Now you have created nested threads which is wrong
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
#pragma omp critical
    {
        sum += val;
    }
}

最好的解决方案是

int n = 100000000;
double sum = 0.0;
int nThreads = 5;
#pragma omp parallel shared(sum, n) num_threads(nThreads) // Create omp threads, and always declare the shared and private variables here.
// Also declare the maximum number of threads.
// Do note that num_threads(nThreads) doesn't guarantees that the number of omp threads created is nThreads. It just says that maximum number of threads that can be created is nThreads... 
// num_threads actually limits the number of threads that can be created
{
    double val = 0.0;  // val can be declared as local variable (for each thread) 
#pragma omp for nowait       // now pragma for  (here you don't need to create threads, that's why no "omp parallel" )
    // nowait specifies that the threads don't need to wait (for other threads to complete) after for loop, the threads can go ahead and execute the critical section 
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
#pragma omp critical
    {
        sum += val;
    }
}

score 2 · Accepted Answer

您不需要在 OpenMP 中显式指定共享变量，因为默认情况下始终共享外部范围的变量（除非default(none)指定了子句）。由于private变量具有未定义的初始值，因此您应该在累积循环之前将私有副本归零。循环计数器会自动识别并设为私有 - 无需明确声明它们。此外，由于您只是在更新一个值，因此您应该使用一个atomic构造，因为它比完整的关键部分更轻量级。

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val)
{
    val = 0.0;
    #pragma omp for num_threads(nthreads)
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
    #pragma omp atomic update
    sum += val;
}

该update子句已添加到atomicOpenMP 3.1 的结构中，因此如果您的编译器符合早期的 OpenMP 版本（例如，如果您使用即使在 VS2012 中也仅支持 OpenMP 2.0 的 MSVC++），则必须删除该update子句。由于val没有在并行循环之外使用，它可以像 veda 的答案一样在内部范围内声明，然后它自动成为私有变量。

请注意，这parallel for是嵌套两个 OpenMP 构造的快捷方式：parallel和for：

#pragma omp parallel for sharing_clauses scheduling_clauses
for (...) {
}

相当于：

#pragma omp parallel sharing_clauses
#pragma omp for scheduling_clauses
for (...) {
}

对于其他两个组合结构也是如此：parallel sections和parallel workshare（仅限 Fortran）

c - OpenMP 自定义缩减变量

2 回答 2

Related

Reference