我正在尝试使用 3 个 OpenMP 线程在 3 个 nVidia GPU 上分配将两个 NxN 矩阵相乘的工作。(矩阵值会变大,因此 long long 数据类型。)但是我无法将其放置#pragma acc parallel loop
在正确的位置。我在共享的 nVidia PDF 中使用了一些示例,但没有成功。我知道最里面的循环不能并行化。但我希望三个线程中的每一个都拥有一个 GPU 并完成部分工作。请注意,输入和输出矩阵被定义为全局变量,因为我一直用完堆栈内存。
我已经尝试了下面的代码,但我得到的编译错误都指向第 75 行,这是该#pragma acc parallel loop
行
[test@server ~]pgcc -acc -mp -ta=tesla:cc60 -Minfo=all -o testGPU matrixMultiplyopenmp.c
PGC-S-0035-Syntax error: Recovery attempted by replacing keyword for by keyword barrier (matrixMultiplyopenmp.c: 75)
PGC-S-0035-Syntax error: Recovery attempted by replacing acc by keyword enum (matrixMultiplyopenmp.c: 76)
PGC-S-0036-Syntax error: Recovery attempted by inserting ';' before keyword for (matrixMultiplyopenmp.c: 77)
PGC/x86-64 Linux 18.10-1: compilation completed with severe errors
功能是:
void multiplyMatrix(long long int matrixA[SIZE][SIZE], long long int matrixB[SIZE][SIZE], long long int matrixProduct[SIZE][SIZE])
{
// Get Nvidia device type
acc_init(acc_device_nvidia);
// Get Number of GPUs in system
int num_gpus = acc_get_num_devices(acc_device_nvidia);
//Set the number of OpenMP thread to the number of GPUs
#pragma omp parallel num_threads(num_gpus)
{
//Get thread openMP number and set the GPU device to that number
int threadNum = omp_get_thread_num();
acc_set_device_num(threadNum, acc_device_nvidia);
int row;
int col;
int key;
#pragma omp for
#pragma acc parallel loop
for (row = 0; row < SIZE; row++)
for (col = 0; col < SIZE; col++)
for (key = 0; key < SIZE; key++)
matrixProduct[row][col] = matrixProduct[row][col] + (matrixA[row][key] * matrixB[key][col]);
}
}