我试图从 CUDA C 中的 3D 纹理中获取数据,但函数调用 tex3D() 总是返回零。以下是相关代码:
主持人:
#define L 64
typedef uint2 splitspin_t;
texture<splitspin_t, 3> texRef;
cudaArray *arrayPointer;
cudaExtent LLLextent = make_cudaExtent(L, L, L);
cudaChannelFormatDesc cf = cudaCreateChannelDesc<splitspin_t>();
cudaChk(cudaMalloc3DArray( &arrayPointer, &cf, LLLextent ));
cudaMemcpy3DParms params = {0};
params.extent = LLLextent;
params.kind = cudaMemcpyHostToDevice;
params.srcPtr.ptr = h; // size L*L*L*sizeof(splitspin_t) allocated by malloc
params.srcPtr.pitch = sizeof(splitspin_t) * L;
params.srcPtr.xsize = L;
params.srcPtr.ysize = L;
params.srcPos.x = 0;
params.srcPos.y = 0;
params.srcPos.z = 0;
params.dstArray = arrayPointer;
params.dstPos.x = 0;
params.dstPos.y = 0;
params.dstPos.z = 0;
cudaChk(cudaMemcpy3D( ¶ms ));
texRef.normalized = 0;
texRef.filterMode = cudaFilterModePoint;
texRef.addressMode[0] = cudaAddressModeClamp;
texRef.addressMode[1] = cudaAddressModeClamp;
texRef.addressMode[2] = cudaAddressModeClamp;
cudaChk(cudaBindTextureToArray( texRef, arrayPointer, cf ));
cudaFreeArray(arrayPointer);
设备:
#define GX (threadIdx.x + blockIdx.x*blockDim.x)
#define GY (threadIdx.y + blockIdx.y*blockDim.y)
#define GZ (threadIdx.z + blockIdx.z*blockDim.z)
printf("%lX %lx\n", tex3D(texRef, GX, GY, GZ).y, tex3D(texRef, GX, GY, GZ).x); // always prints zeros
我已经验证了 h 指向的内存被初始化为非零。我还通过在第一个 cudaMemcpy3D 之后将 h 归零,使用第二个 cudaMemcpy3D 从 arrayPointer 复制回 h 来验证 cudaMemcpy3D 是成功的,然后检查 h 然后包含与以前相同的数据。我想也许问题也可能是由于我使用了非标准类型(uint2),但是 typedef-ing splitspin_t to float 并没有解决问题。
因此,我怀疑 cudaBindTextureToArray 函数调用,但我看不到我到目前为止所犯的任何错误。
提前致谢。