cuda - Cuda 不同的内存分配

Question

我正在使用 CUDA 开发一个小型应用程序。
我有一个巨大的二维数组（不适合共享内存），其中所有块中的线程将在随机位置不断读取。
这个二维数组是一个只读数组。
我应该在哪里分配这个二维数组？全局内存？常量内存？纹理记忆？

score 2 · Accepted Answer

根据你的设备纹理内存的大小，你应该在这个区域实现它。实际上，纹理内存是基于顺序局部缓存机制的。这意味着当连续标识符的线程尝试访问相对靠近的存储位置内的数据元素时，内存访问得到了优化。
此外，此处为 2D 访问实现了该局部性。因此，当每个线程到达存储在纹理内存中的数组的数据元素时，您就处于连续 2D 访问的情况。因此，您可以充分利用内存架构。

不幸的是，这个内存并没有那么大，如果有巨大的数组，你也许可以让你的数据适合它。在这种情况下，您无法避免使用全局内存。

score 1 · Accepted Answer

I agree the jHackTheRipper, a simple solution would be to use texture memory and then profile using the Compute Visual Profiler. Heres a good set of slides from NVIDIA about the different memory types for image convolution; it shows that good shared memory usage and global reads was not too much faster than using texture memory. In your case you should get some coalesced reads from the texmemory that you wouldn't usually get with accessing random values in global memory.

score 0 · Accepted Answer

如果它足够小以适应它的常量或纹理，我会尝试所有三个。

您没有在此处列出的一个有趣的选项是主机上的映射内存。您可以在主机上分配可从设备访问的内存，而无需将其显式传输到设备内存。根据您需要访问的数组数量，它可能比复制到全局内存并从那里读取更快。

cuda - Cuda 不同的内存分配

3 回答 3

Related

Reference