I tried a method so that the data transfer from Host to Device will not be used. Normally, we assign values to the elements in the Host array using a loop and transfer it to the Device. This works fine for me on 1D and 2D arrays. The new method i tried is, to give the values to the array elements in the kernel. I succeeded for 1D arrays. But, for 2D array, the result is 0
. My device can support (512,512) threads per block. The output values are coming fine upto Length=22 but displays '0' for Length=23 [22<sqrt(512)<23]
. As per [22<sqrt(512)<23]
, i can see that only 22x22
threads are being used. Whats the problem?? Why is this happening?
The Code:
const int Length=23;
Main Function:
int A[Length],B[Length],C[Length],D[Length],*Ad,*Bd;
int size=Length*sizeof(int);
cudaMalloc((void**)&Ad,size);
cudaMalloc((void**)&Bd,size);
dim3 dimGrid(1,1);
dim3 dimBlock(Length,Length);
FuncG<<<dimGrid,dimBlock>>>(Ad,Bd);
cudaMemcpy(C,Ad,size,cudaMemcpyDeviceToHost);
cudaMemcpy(D,Bd,size,cudaMemcpyDeviceToHost);
for(int i=0;i<Length;i++){
printf("%d %d\n",C[i],D[i]);
}
return 0;
Kernel Function:
__global__ void FuncG(int *Ad,int *Bd){
int tx=threadIdx.x;
int ty=threadIdx.y;
Ad[tx]=tx;
Bd[ty]=ty;
}