0

我有与帖子中描述的完全相同的问题: cudaBindTexture2D 上的 CUDA 错误

我什至有以下错误:

错误 18:无效的纹理参考。”并且还遇到“不会在 cudaMalloc 上抛出错误,而只会在 cudaBindTexture 上抛出错误

不幸的是,对于像我这样刚开始使用 CUDA 的人来说,发帖人 (Anton Roth) 回答他自己的问题的方式有点过于神秘:

答案在评论中,我使用了一个我的 GPU 不兼容的 sm。

“与 GPU 不兼容”是有道理的,因为示例程序FluidsGL(在 NVIDIA CUDA 示例浏览器中称为“Fluids(OpenGL 版本)”)在我的笔记本电脑上失败,但在我的工作桌面上运行良好。不幸的是,我仍然不知道“评论中”指的是什么,甚至不知道如何检查 GPU SM 的兼容性。

这是似乎导致问题的代码:

#define DIM 512

main

setupTexture(DIM, DIM);
bindTexture();

fluidsGL_kernels.cu

texture<float2, 2> texref;
static cudaArray *array = NULL;

void setupTexture(int x, int y)
{
    // Wrap mode appears to be the new default
    texref.filterMode = cudaFilterModeLinear;
    cudaChannelFormatDesc desc = cudaCreateChannelDesc<float2>();

    cudaMallocArray(&array, &desc, y, x);
    getLastCudaError("cudaMalloc failed");
}

void bindTexture(void)
{
    cudaBindTextureToArray(texref, array);//this function itself doesn't throw the error but error 18 is caught by the function below
    getLastCudaError("cudaBindTexture failed");
}

硬件信息

这是输出deviceQuery

Device 0: "GeForce 9800M GS"
  CUDA Driver Version / Runtime Version          5.0 / 5.0
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 1024 MBytes (1073741824 bytes)
  ( 8) Multiprocessors x (  8) CUDA Cores/MP:    64 CUDA Cores
  GPU Clock rate:                                1325 MHz (1.32 GHz)
  Memory Clock rate:                             799 Mhz
  Memory Bus Width:                              256-bit
  Max Texture Dimension Size (x,y,z)             1D=(8192), 2D=(65536,32768), 3D
=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(8192) x 512, 2D=(8192,8192)
 x 512
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  768
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Mo
del)
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           8 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simu
ltaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Versi
on = 5.0, NumDevs = 1, Device0 = GeForce 9800M GS

我知道我的 GPU 有点老了,但它仍然可以很好地运行大多数示例。

4

1 回答 1

1

您需要为正确的架构编译代码(如您链接的帖子中所述)。

由于您有 CC 1.1 设备,请使用以下 nvcc 编译选项:

-gencode arch=compute_11,code=sm_11

默认的 Visual Studio 项目或 Makefile 可能无法针对正确的体系结构进行编译,因此请始终确保它可以编译。

对于 Visual Studio,请参阅此答案:https ://stackoverflow.com/a/14413360/1043187

对于 Makefile,这取决于。CUDA SDK 示例通常有一个GENCODE_FLAGS可以修改的变量。

于 2013-06-13T06:19:07.223 回答