0

好吧,我几周前发布了关于我在 openCL 实现中遇到的一个错误,但似乎我必须从头开始。那么,在 OpenCL 中应该如何实现下一个算法。

int m = 10;
int n = 10;
//arrA[] has m elements
//arrB[] has n elements
//arrC[] has m x n elements

for(int i = 0; i < m; i++)
{
  for(int j = 0; j < n; j++)
  {
    arrC[i x j] = arrA[i] x arrB[j];
  }
}

对于这种情况,我只需要知道如何使用全局和本地 id 来处理这个问题......因为我有点迷路了。太感谢了

4

1 回答 1

0

这是我目前拥有的代码(这是对实际代码的提取,因为我需要获得最大值,所以将执行缩减)。

"sampleKernel(__global const double *bufferX,"
"             __global const double *bufferY,"
"             __global double* result,"
"             __const int lengthX,"
"             __const int lengthY){"
"    const int index_a = get_global_id(0);"//Get the global indexes for 2D reference
"    const int index_b = get_global_id(1);"
"    const int local_index = get_local_id(0);"//Current thread id -> Should be the same as index_a * lengthY + index_b;
"    if (local_index < (lengthX * lengthY)) {"// Load data into local memory
"       if(index_a < lengthX && index_b < lengthY)"
"       {"
"           result[local_index] = bufferX[index_a] * bufferY[index_b];"
"       }"
"    } "
"}";

也许我也应该使用 get_local_id(1),并将线程 ID 用作 local_id_1 * N + local_id_2 其中 N 是最大 local_id_2 值。

于 2013-10-09T02:59:48.897 回答