5

Shared memory is "striped" into banks. This leads to the whole issue of bank conflicts, as we all know.

Question: But how can you determine how many banks ("stripes") exist in shared memory?

(Poking around NVIDIA "devtalk" forums, it seems that per-block shared memory is "striped" into 16 banks. But how do we know this? The threads suggesting this are a few years old. Have things changed? Is it fixed on all NVIDIA CUDA-capable cards? Is there a way to determine this from the runtime API (I don't see it there, e.g. under cudaDeviceProp)? Is there a manual way to determine it at runtime?)

4

1 回答 1

10

正如@RobertHarvey 所说,它已记录在案。编程指南指出计算能力 1.x有 16 个库,计算能力 2.x3.x有 32 个库。因此,您可以根据设备属性中返回的计算能力(主要版本)做出任何决定。

cuda 在线文档的一般链接包含在cuda 标签的信息链接中。

于 2013-06-10T15:29:45.247 回答