这是我目前拥有的代码(这是对实际代码的提取,因为我需要获得最大值,所以将执行缩减)。
"sampleKernel(__global const double *bufferX,"
" __global const double *bufferY,"
" __global double* result,"
" __const int lengthX,"
" __const int lengthY){"
" const int index_a = get_global_id(0);"//Get the global indexes for 2D reference
" const int index_b = get_global_id(1);"
" const int local_index = get_local_id(0);"//Current thread id -> Should be the same as index_a * lengthY + index_b;
" if (local_index < (lengthX * lengthY)) {"// Load data into local memory
" if(index_a < lengthX && index_b < lengthY)"
" {"
" result[local_index] = bufferX[index_a] * bufferY[index_b];"
" }"
" } "
"}";
也许我也应该使用 get_local_id(1),并将线程 ID 用作 local_id_1 * N + local_id_2 其中 N 是最大 local_id_2 值。