0

I tried a method so that the data transfer from Host to Device will not be used. Normally, we assign values to the elements in the Host array using a loop and transfer it to the Device. This works fine for me on 1D and 2D arrays. The new method i tried is, to give the values to the array elements in the kernel. I succeeded for 1D arrays. But, for 2D array, the result is 0. My device can support (512,512) threads per block. The output values are coming fine upto Length=22 but displays '0' for Length=23 [22<sqrt(512)<23]. As per [22<sqrt(512)<23], i can see that only 22x22 threads are being used. Whats the problem?? Why is this happening?

The Code:

    const int Length=23;

Main Function:

    int A[Length],B[Length],C[Length],D[Length],*Ad,*Bd;
    int size=Length*sizeof(int);
    cudaMalloc((void**)&Ad,size);
    cudaMalloc((void**)&Bd,size);
    dim3 dimGrid(1,1);
    dim3 dimBlock(Length,Length);
    FuncG<<<dimGrid,dimBlock>>>(Ad,Bd);
    cudaMemcpy(C,Ad,size,cudaMemcpyDeviceToHost);
    cudaMemcpy(D,Bd,size,cudaMemcpyDeviceToHost);
    for(int i=0;i<Length;i++){
        printf("%d  %d\n",C[i],D[i]);
    }
    return 0;

Kernel Function:

__global__ void FuncG(int *Ad,int *Bd){
    int tx=threadIdx.x;
    int ty=threadIdx.y;
    Ad[tx]=tx;
    Bd[ty]=ty;
}
4

1 回答 1

2

您的设备每个块只能支持 512 个线程。前两个线程块尺寸的最大尺寸为 512。22x22 块(484 个线程)是合法的块大小,但 23x23 块(529 个线程)不是。

您得到 0 输出,因为内核从未运行。如果您检查它,您会发现内核启动失败并出现无效的执行配置错误。检查此类启动失败的规范方法如下:

FuncG<<<dimGrid,dimBlock>>>(Ad,Bd);
if (cudaPeekAtLastError() != cudaSuccess) {
    // handle error.....
}
于 2012-09-19T10:33:59.917 回答