cuda - Why doesn't CUDA allow us to use all of the SM memory as L1 cache?

Question

In a CUDA device, each SM has 64KB of on-chip memory that is placed close to it. By default, this is partitioned into 48KB of shared memory and 16KB of L1 cache. For kernels whose memory access pattern is hard to determine, this partitioning can be changed to 16KB of shared memory and 48KB of L1 cache.

Why doesn't CUDA allow all of the 64KB per-SM on-chip memory to be used as L1 cache?

There are many kinds of kernels which have no use for shared memory, but could use that extra 16KB of L1 cache.

score 4 · Accepted Answer

我相信原因是计算机图形学。在运行 OpenGL 或 Direct3D 代码时，SM 将直接映射内存（CUDA 共享）用于一个目的（例如顶点属性），并将组关联内存（L1 缓存）用于另一个目的。对于图形管道，架构师能够根据其他单元的顶点吞吐量限制（例如），专门调整他们需要多少内存来处理诸如顶点属性之类的事情。

在考虑 GPU 的架构决策（以及一般的处理器设计）时，重要的是要记住许多决策在很大程度上是经济的。GPU 计算有一项日常工作：游戏和图形。如果不是这项日常工作，GPU 计算就不会在经济上变得可行，大规模并行计算也不太可能为大众所用。

几乎所有用于计算的 GPU 功能都以某种方式在图形管道中使用。如果不是（想想 ECC 内存），那么必须为使用它的市场提供更高的产品价格（想想 HPC）。

cuda - Why doesn't CUDA allow us to use all of the SM memory as L1 cache?

1 回答 1

Related

Reference