我正在寻找一种方法来估计 L3 缓存未命中的数量,方法是在我的带有 Intel CPU(Intel i7 6700 skylake)的 Linux PC 上使用“IA32_PERFEVTSELx”和“IA32_PMCx”MSR 对。为此,我在内核中安装了一个计时器,它会定期(1 秒)报告 PMC 的值。在代码中,我在写入“0x41412E”后读取了 IA32_PMC1 MSR 的值(映射到 0xC2),其中 EVENT Select 部分为 0x2E,UMask 部分为 0x41,第 16 位是用户,第 22 位是相对于 IA32_PERFEVTSEL1 的启用位MSR(映射到 0x187):
uint64_t val = 0x41412E; // UMask:0x41 + EVENT Select:0x2E + User bit + Enable bit
uint64_t ret = 0x0;
rdmsr_safe(0x187, ret); // 0x187 is mapped address of PERFEVTSEL1 MSR
if ( ret != 0x41412E ) {
if ( wrmsr_safe(0x187, val) ) {
TEMP_DEBUG("failed to write msr!!!");
}
}
if ( rdmsr_safe(0xC2, ret) ) { // 0xC2 is mapped address of PMC1 MSR
TEMP_DEBUG("failed to read msr!!!");
} else {
TEMP_DEBUG("rdmsr: %lu", ret);
}
即使我预计该值代表 L3 缓存未命中的数量,这似乎也很奇怪。它的值太高了,所以我想这不是 L3 缓存未命中的数量,我在手册中找不到它的含义(英特尔® 64 和 IA-32 架构软件开发人员手册第 3B 卷:系统编程指南)。我观察到的值如下:
rdmsr: 0 at start_shscan(56) in mcsched.c
rdmsr: 0 at start_shscan(56) in mcsched.c
rdmsr: 8595908 at start_shscan(56) in mcsched.c
rdmsr: 17274482 at start_shscan(56) in mcsched.c
rdmsr: 21449216 at start_shscan(56) in mcsched.c
rdmsr: 26305745 at start_shscan(56) in mcsched.c
rdmsr: 26511242 at start_shscan(56) in mcsched.c
rdmsr: 33316291 at start_shscan(56) in mcsched.c
rdmsr: 34736360 at start_shscan(56) in mcsched.c
rdmsr: 35151932 at start_shscan(56) in mcsched.c
rdmsr: 43806356 at start_shscan(56) in mcsched.c
rdmsr: 51132302 at start_shscan(56) in mcsched.c
rdmsr: 59797757 at start_shscan(56) in mcsched.c
rdmsr: 0 at start_shscan(56) in mcsched.c
rdmsr: 0 at start_shscan(56) in mcsched.c
rdmsr: 6820029 at start_shscan(56) in mcsched.c
rdmsr: 8322078 at start_shscan(56) in mcsched.c
rdmsr: 63313471 at start_shscan(56) in mcsched.c
rdmsr: 397962 at start_shscan(56) in mcsched.c
rdmsr: 9429026 at start_shscan(56) in mcsched.c
rdmsr: 18124455 at start_shscan(56) in mcsched.c
rdmsr: 23706367 at start_shscan(56) in mcsched.c
rdmsr: 27087960 at start_shscan(56) in mcsched.c
rdmsr: 68769660 at start_shscan(56) in mcsched.c
rdmsr: 69110424 at start_shscan(56) in mcsched.c
rdmsr: 78216541 at start_shscan(56) in mcsched.c
rdmsr: 87385467 at start_shscan(56) in mcsched.c
rdmsr: 95083478 at start_shscan(56) in mcsched.c
rdmsr: 101347654 at start_shscan(56) in mcsched.c
rdmsr: 8327692 at start_shscan(56) in mcsched.c
rdmsr: 27377092 at start_shscan(56) in mcsched.c
rdmsr: 36316258 at start_shscan(56) in mcsched.c
rdmsr: 45323291 at start_shscan(56) in mcsched.c
rdmsr: 54366010 at start_shscan(56) in mcsched.c
rdmsr: 63135801 at start_shscan(56) in mcsched.c
rdmsr: 72037000 at start_shscan(56) in mcsched.c
rdmsr: 81032798 at start_shscan(56) in mcsched.c
rdmsr: 89975340 at start_shscan(56) in mcsched.c
rdmsr: 98661287 at start_shscan(56) in mcsched.c
rdmsr: 107482921 at start_shscan(56) in mcsched.c
rdmsr: 116290561 at start_shscan(56) in mcsched.c
rdmsr: 125135979 at start_shscan(56) in mcsched.c
rdmsr: 133920103 at start_shscan(56) in mcsched.c
rdmsr: 142695638 at start_shscan(56) in mcsched.c
rdmsr: 151456156 at start_shscan(56) in mcsched.c
rdmsr: 160171239 at start_shscan(56) in mcsched.c
rdmsr: 168879495 at start_shscan(56) in mcsched.c
rdmsr: 177788861 at start_shscan(56) in mcsched.c
rdmsr: 186589920 at start_shscan(56) in mcsched.c
rdmsr: 195331675 at start_shscan(56) in mcsched.c
rdmsr: 204166715 at start_shscan(56) in mcsched.c
rdmsr: 213045449 at start_shscan(56) in mcsched.c
rdmsr: 221942627 at start_shscan(56) in mcsched.c
rdmsr: 231073520 at start_shscan(56) in mcsched.c
我在代码中犯了什么错误吗?或者请给我一个关于价值观的建议。
=======================下面添加内容======================== ==
@Peter Cordes,我参考了英特尔手册(英特尔® 64 和 IA-32 架构软件开发人员手册第 3B 卷:系统编程指南),我打算使用“LLC Misses”,这是预定义的架构性能事件之一下表:
我认为在 perf 中举一个例子更有助于你理解:我可以在 perf 中使用“perf stat -e r412e ls”来估计“ls”命令期间的 L3 缓存未命中。“r412e”可以分为“r”+“41”+“2e”,r代表“[Raw hardware evnet event descriptor”,41是UMask(0x41),2e是Event Select(0x2e)。您可以通过“性能列表”查看它。