arrays - 使用 Cuda 处理多个大小的数组

Question

好的，所以我有这个巨大的数组让我们称之为 J

现在对于 J 的每个元素都有一个关联的数组 TJ 但 TJ 的长度相对于 J 是可变的

因此，例如 secuencial 程序看起来像这样

for(J=0;J<length(ARRAY_J))
do
  for(T=0;T<length(ARRAY_TJ))
  do
    ARRAY_RESULT[J]+=ARRAY_J[J]+ARRAY_TJ[T]
  end
end

所以我想如果我将线程安排在 2D 块中，我可以将线程的 x 索引用于 J，将线程的 y 索引用于 T

现在我知道 J 的长度，但 T 的长度会有所不同，所以我不知道如何在 Cuda 中定义它。

例如

ARRAY_RESULT[blockidx.y*blockDim.y+threadidx.y]+=ARRAY_J[blockidx.y*blockDim.y+threadidx.y]+ARRAY_TJ[blockidx.x*blockDim.x+threadidx.x]

那么考虑到 ARRAY_TJ 的长度是可变的，我怎么能在这里定义块的尺寸呢？我应该使用最大的 ARRAY_TJ 长度吗？但是，像上面这样的代码会起作用吗？对于 ARRAY_J 的每个值，它会将长度（ARRAY_TJ）值相加吗？

score 1 · Accepted Answer

I think it should be better to use 1D blocks, with length of J threads, and in each thread do

int thread = blockIdx.x * blockDim.x + threadIdx.x;
for(T=0;T<length(ARRAY_TJ))
    ARRAY_RESULT[thread]+=ARRAY_J[thread]+ARRAY_TJ[T]

If you try to do it in 2D with the second dimension for the TJ array, more than one thread will be writing to the same position of ARRAY_RESULT at the same time (with the problems it carries) and there is no easy management of critical sections in cuda.

arrays - 使用 Cuda 处理多个大小的数组

1 回答 1

Related

Reference