performance - 为什么 cachegrind 忽略了 L3 缓存，与文档相矛盾？

Question

我想了解人们如何进行缓存优化，一位朋友建议我使用 cachegrind 作为实现此目标的有用工具。

Valgrind 是一个 CPU 模拟器，在使用 cachegrind 时假设有 2 级缓存，如此处所述

Cachegrind 模拟您的程序如何与机器的缓存层次结构和（可选）分支预测器进行交互。它模拟具有独立的一级指令和数据缓存（I1 和 D1）的机器，由统一的二级缓存 (L2) 支持。这与许多现代机器的配置完全匹配。

下一段继续为

然而，一些现代机器具有三级或四级缓存。对于这些机器（在 Cachegrind 可以自动检测缓存配置的情况下），Cachegrind 模拟第一级和 最后一级缓存。这种选择的原因是最后一级缓存对运行时的影响最大，因为它屏蔽了对主存的访问。

然而，当我尝试在我的简单矩阵-矩阵乘法代码上运行 valgrind 时，我得到了以下输出。

==6556== Cachegrind, a cache and branch-prediction profiler
==6556== Copyright (C) 2002-2010, and GNU GPL'd, by Nicholas Nethercote et al.
==6556== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==6556== Command: ./a.out
==6556== 
--6556-- warning: L3 cache detected but ignored
==6556== 
==6556== I   refs:      50,986,869
==6556== I1  misses:         1,146
==6556== L2i misses:         1,137
==6556== I1  miss rate:       0.00%
==6556== L2i miss rate:       0.00%
==6556== 
==6556== D   refs:      20,232,408  (18,893,241 rd   + 1,339,167 wr)
==6556== D1  misses:       150,194  (   144,869 rd   +     5,325 wr)
==6556== L2d misses:        10,451  (     5,506 rd   +     4,945 wr)
==6556== D1  miss rate:        0.7% (       0.7%     +       0.3%  )
==6556== L2d miss rate:        0.0% (       0.0%     +       0.3%  )
==6556== 
==6556== L2 refs:          151,340  (   146,015 rd   +     5,325 wr)
==6556== L2 misses:         11,588  (     6,643 rd   +     4,945 wr)
==6556== L2 miss rate:         0.0% (       0.0%     +       0.3%  )

根据文档，应该使用 L1 和 L3 缓存，但输出显示 L3 缓存被忽略。这是为什么？

cachegrind 是否也预先假定 L1 和最后一级缓存大小是多少，或者它是否使用当前运行的 CPU 的 L1 和最后一级缓存大小？

score 2 · Accepted Answer

您在 cachegrind 似乎没有完全支持的英特尔 CPU 上运行。他们检查 cpuid 标志并根据针对不同处理器的大量案例语句确定支持。

这是来自代码的非官方副本，但具有说明性 - https://github.com/koriakin/valgrind/blob/master/cachegrind/cg-x86-amd64.c：

/* Intel method is truly wretched.  We have to do an insane indexing into an
 * array of pre-defined configurations for various parts of the memory
 * hierarchy.
 * According to Intel Processor Identification, App Note 485.
 */
static
Int Intel_cache_info(Int level, cache_t* I1c, cache_t* D1c, cache_t* L2c)
{
...
      case 0x22: case 0x23: case 0x25: case 0x29:
      case 0x46: case 0x47: case 0x4a: case 0x4b: case 0x4c: case 0x4d:
      case 0xe2: case 0xe3: case 0xe4: case 0xea: case 0xeb: case 0xec:
          VG_(dmsg)("warning: L3 cache detected but ignored\n");
          break;

performance - 为什么 cachegrind 忽略了 L3 缓存，与文档相矛盾？

1 回答 1

Related

Reference