cuda - 新的 CUDA 纹理对象 - 在 2D 情况下获取错误数据

Question

在 CUDA 5.0 中，NVIDIA 添加了一个“纹理对象”( cudaTextureObject_t)，使纹理更易于使用。以前，必须将纹理定义为全局变量。

我按照这个 NVIDIA 示例使用cudaTextureObject_t. 它适用于一维案例。我试图扩展示例以处理 2D 音高内存：

#define WIDTH 6
#define HEIGHT 2
int width = WIDTH; int height = HEIGHT;
float h_buffer[12] = {1,2,3,4,5,6,7,8,9,10,11,12};
float* d_buffer;
size_t pitch;
cudaMallocPitch(&d_buffer, &pitch, sizeof(float)*width, height);
cudaMemcpy2D(d_buffer, pitch, &h_buffer, sizeof(float)*width, sizeof(float)*width, height, cudaMemcpyHostToDevice);
printf("pitch = %d \n", pitch);

//CUDA 5 texture objects: https://developer.nvidia.com/content/cuda-pro-tip-kepler-texture-objects-improve-performance-and-flexibility
cudaResourceDesc resDesc;
memset(&resDesc, 0, sizeof(resDesc));
resDesc.resType = cudaResourceTypePitch2D;
resDesc.res.pitch2D.devPtr = d_buffer;
resDesc.res.pitch2D.pitchInBytes =  pitch;
resDesc.res.pitch2D.width = width;
resDesc.res.pitch2D.height = height;
resDesc.res.pitch2D.desc.f = cudaChannelFormatKindFloat;
resDesc.res.pitch2D.desc.x = 32; // bits per channel 
resDesc.res.pitch2D.desc.y = 32; 
cudaTextureDesc texDesc;
memset(&texDesc, 0, sizeof(texDesc));
texDesc.readMode = cudaReadModeElementType;
cudaTextureObject_t tex;
cudaCreateTextureObject(&tex, &resDesc, &texDesc, NULL);

为了查看数据是否确实可以通过纹理缓存访问，我在这个内核中打印了几个字节：

__global__ void printGpu_tex(cudaTextureObject_t tex) {
    int tidx = blockIdx.x * blockDim.x + threadIdx.x;
    int tidy = blockIdx.y * blockDim.y + threadIdx.y;
    if(tidx < WIDTH && tidy < HEIGHT){
        float x = tex2D<float>(tex, tidy, tidx);
        printf("tex2D<float>(tex, %d, %d) = %f \n", tidy, tidx, x);
    }
}

我预计它的输出是“1,2,3,...,12”。但是，它会打印“1,7,7,7,...3,9,...”：

tex2D<float>(tex, 0, 0) = 1.000000 
tex2D<float>(tex, 0, 1) = 7.000000 
tex2D<float>(tex, 0, 2) = 7.000000 
tex2D<float>(tex, 0, 3) = 7.000000 
tex2D<float>(tex, 0, 4) = 7.000000 
tex2D<float>(tex, 0, 5) = 7.000000 
tex2D<float>(tex, 1, 0) = 3.000000 
tex2D<float>(tex, 1, 1) = 9.000000 
tex2D<float>(tex, 1, 2) = 9.000000 
tex2D<float>(tex, 1, 3) = 9.000000 
tex2D<float>(tex, 1, 4) = 9.000000 
tex2D<float>(tex, 1, 5) = 9.000000

为了验证d_buffer数据设置是否正确，我还为原始d_buffer数组制作了一个“打印内核”，而不使用纹理缓存：

__global__ void printGpu_vanilla(float* d_buffer, int pitch) {
    int tidx = blockIdx.x * blockDim.x + threadIdx.x;
    int tidy = blockIdx.y * blockDim.y + threadIdx.y;
    if(tidx < WIDTH && tidy < HEIGHT){
        float x = d_buffer[tidy*pitch + tidx];
        printf("d_buffer[%d][%d] = %f \n", tidy, tidx, x);
    }
}

输出看起来不错（与纹理缓存版本不同）：

d_buffer[0][0] = 1.000000 
d_buffer[0][2] = 2.000000 
d_buffer[0][3] = 3.000000 
d_buffer[0][4] = 4.000000 
d_buffer[0][5] = 5.000000 
d_buffer[0][5] = 6.000000 
d_buffer[1][0] = 7.000000 
d_buffer[1][6] = 8.000000 
d_buffer[1][7] = 9.000000 
d_buffer[1][8] = 10.000000 
d_buffer[1][9] = 11.000000 
d_buffer[1][5] = 12.000000

关于纹理缓存版本可能出现什么问题的任何想法？

下载：

cudaTextureObject_t用于1D纹理的工作示例代码
cudaTextureObject_t用于2D纹理的损坏示例代码（如上所述）

score 4 · Accepted Answer

你cudaChannelFormatDesc的输入resDesc.res.pitch2D.desc是错误的：y应该是0.

设置FormatDesc正确的使用CreateChannelDesc<>()功能，resDesc.res.pitch2D.desc = cudaCreateChannelDesc<float>();而不是手动设置。

resDesc.res.pitch2D.desc.y = 32对float2纹理有效。

score 0 · Accepted Answer

除了cudaChannelFormatDesc，您的代码中似乎有一个逻辑问题，这不是什么大问题，但如果您不谨慎，可能会非常误导。如果您想按照 CUDA 线程组织成块和网格以及安排换行的方式（此外，如果您希望您的代码与 C++ 的“行主要”概念保持一致），最好将其视为x最快变化的维度（类似排专业）。由于您的代码显示y变化更快x，因此更合适的方法是切换代码中的索引：

float x = tex2D<float>(tex, tidx, tidy);
printf("tex2D<float>(tex, %d, %d) = %f \n", tidx, tidy, x);
...
printf("d_buffer[%d][%d] = %f \n", tidx, tidy, x);

值得再次提一下，这不是什么大问题，但同时可能会让人很困惑，特别是当您想将此内核与代码的其他部分集成时。

cuda - 新的 CUDA 纹理对象 - 在 2D 情况下获取错误数据

2 回答 2

Related

Reference