我在分析计算能力 3.5 的 CUDA 卡上的 L2 缓存时遇到问题。在 Kepler (3.x) 中,来自全局内存的负载只缓存在 L2 中,从不在 L1 中。我的问题是如何使用 nvprof(命令行分析器)来查找我的全局负载在 L2 缓存中达到的命中率?我已经查询了我可以收集的所有指标,涉及 L2 疼痛的指标是:
l2_l1_read_hit_rate: Hit rate at L2 cache for all read requests from L1 cache
l2_texture_read_hit_rate: Hit rate at L2 cache for all read requests from texture cache
l2_l1_read_throughput: Memory read throughput seen at L2 cache for read requests from L1 cache
l2_texture_read_throughput: Memory read throughput seen at L2 cache for read requests from the texture cache
l2_read_transactions: Memory read transactions seen at L2 cache for all read requests
l2_write_transactions: Memory write transactions seen at L2 cache for all write requests
l2_read_throughput: Memory read throughput seen at L2 cache for all read requests
l2_write_throughput: Memory write throughput seen at L2 cache for all write requests
l2_utilization: The utilization level of the L2 cache relative to the peak utilization
我得到的唯一命中率是来自 L1 的读取。但是对全局内存的读取永远不会来自 L1,因为它们没有缓存在那里!还是我在这里错了,这正是我想要的指标?
令人惊讶的是(或没有)有一个指标给出了全局内存读取的 L1 命中率。
l1_cache_global_hit_rate: Hit rate in L1 cache for global loads
对于开普勒来说,这可能是非零的吗?
干杯!