我正在尝试在我编写的这个虚拟代码上使用 OpenMP 和 C++ 使用 GPU(GTX 1080Ti)卸载数组计算:
#include <omp.h>
#include <iostream>
using namespace std;
int main(){
//int totalSum, ompSum;
int totalSum=0, ompSum=0;
const int N = 1000;
int array[N];
for (int i=0; i<N; i++){
array[i]=i;
}
#pragma omp target
{
#pragma omp parallel private(ompSum) shared(totalSum)
{
ompSum=0;
omp_set_num_threads(100);
printf ( "Total number of threads are %d!\n", omp_get_num_threads() );
#pragma omp for
for (int i=0; i<N; i++){
ompSum += array[i];
}
#pragma omp critical
totalSum += ompSum;
}
printf ( "Caculated sum should be %d but is %d\n", N*(N-1)/2, totalSum );
}
return 0;
}
运行代码后,这是我得到的输出:
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Caculated sum should be 499500 but is 499500
计算的总和是正确的,但我很好奇为什么它只显示了 8 个线程,而我在代码中设置了 100 个线程。
设置omp_set_num_threads
右下角#pragma omp target
时,运行时会报
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
我是 OpenMP 的新手,如果有人能帮助解释这个问题,我将不胜感激。