Shared memory is "striped" into banks. This leads to the whole issue of bank conflicts, as we all know.
Question: But how can you determine how many banks ("stripes") exist in shared memory?
(Poking around NVIDIA "devtalk" forums, it seems that per-block shared memory is "striped" into 16 banks. But how do we know this? The threads suggesting this are a few years old. Have things changed? Is it fixed on all NVIDIA CUDA-capable cards? Is there a way to determine this from the runtime API (I don't see it there, e.g. under cudaDeviceProp)? Is there a manual way to determine it at runtime?)