linux - 可以从 perf.data 文件生成 perf-stat 结果吗？

Question

当我想使用 Linux 工具套件perf中的 perf-stat 和 perf-report 生成性能报告时，我运行：

$ perf record -o my.perf.data myCmd
$ perf report -i my.perf.data

和：

$ perf stat myCmd

但这意味着我第二次运行“myCmd”，这需要几分钟。相反，我希望：

$ perf stat -i my.perf.data

但与 perf 套件中的大多数工具不同，我没有看到 perf-stat 的 -i 选项。是否有另一种工具可以做到这一点，或者有一种方法可以让 perf-report 生成与 perf-stat 类似的输出？

score 4 · Accepted Answer

我在 kernel.org 上挖掘了源代码，看起来没有办法让 perf stat 解析 perf.data

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-stat.c;h=c70d72003557f17f29345b0f219dc5ca9f572d75;hb=refs /heads/linux-2.6.33.y

如果您查看第 245 行，您会看到函数“run_perf_stat”，而 308-320 周围的行似乎是实际进行记录和整理的内容。

我没有深入研究这一点，以确定是否可以启用您想要的那种功能。

看起来 perf 报告没有很多额外的格式化功能。如果您喜欢这里，您可以进一步检查：

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-report.c;h=860f1eeeea7dbf8e43779308eaaffb1dbcf79d10;hb=refs /heads/linux-2.6.33.y

score 1 · Accepted Answer

perf stat不能用于解析perf.data文件，但您可以要求perf report打印带有事件计数估计的标题perf report --header |egrep Event\|Samples。只有记录到perf.data文件中的事件才会被估计。

在计数模式下使用硬件性能监控单元，并且/perf stat与perf.dataperf record文件perf report使用在周期性溢出模式（采样分析）中配置的相同硬件单元。在这两种模式下，硬件性能计数器都通过它们的控制寄存器设置到一组性能事件（例如 CPU 周期或执行的指令）中，并且计数器将在硬件的每个事件上递增。

在计数模式下perf stat，使用最初在程序启动时设置为零的计数器，它们由硬件递增，并且 perf 将在程序退出时读取最终计数器值（实际上计数将由 OS 分成几个段，最终结果相似 - 完整程序运行的单个值）。

在分析模式下perf record，会将每个硬件计数器设置为某个负值，例如-200000，溢出处理程序将被注册并启用（实际值将由操作系统内核自动调整为某个频率）。每计数 200000 个事件，计数器将从 -1 溢出到零并产生溢出中断。perf_events中断处理程序会将“样本”（当前时间、pid、指令指针、可选-g模式下的调用堆栈）记录到环形缓冲区（由 perf 映射），其中的数据将保存到perf.data. 此处理程序还将-200000再次重置计数器。因此，经过足够长的运行后，将存储许多样本perf.data. 该样本集可用于生成程序的统计配置文件（程序的哪些部分确实运行得更频繁）。但是，如果每 200000 个事件生成每个样本，我们也可以对总事件进行一些估计。由于内核的值自动调整（它试图以 4000 Hz 生成样本）估计会更加困难，使用类似-c 1000000禁用采样周期的自动调整的东西。

perf stat默认模式下显示什么？对于某些 x86_64 cpu，我有：程序的运行时间（任务时钟和经过）、3 个软件事件（上下文切换、cpu 迁移、页面错误）、4 个硬件计数器：周期、指令、分支、分支未命中：

$ echo '3^123456%3' | perf stat bc
0
 Performance counter stats for 'bc':
        325.604672      task-clock (msec)         #    0.998 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               181      page-faults               #    0.556 K/sec                  
       828,234,675      cycles                    #    2.544 GHz                    
     1,840,146,399      instructions              #    2.22  insn per cycle         
       348,965,282      branches                  # 1071.745 M/sec                  
        15,385,371      branch-misses             #    4.41% of all branches        
       0.326152702 seconds time elapsed

perf record默认模式下记录什么？当硬件事件可用时，它是周期事件。在单次唤醒（环形缓冲区溢出）中，perf 确实将 1246 个样本保存到 perf.data

$ echo '3^123456%3' | perf record bc
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.049 MB perf.data (1293 samples) ]

使用perf report --header|less,您可以查看 perf.data 内容perf script：perf script -D

$ perf report --header |grep event
# event : name = cycles:uppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD ...
# Samples: 1K of event 'cycles:uppp'
$ perf script 2>/dev/null |grep cycles|wc -l 
1293

perf.data 中有一些时间戳和一些用于程序启动和退出的附加事件 ( perf script -D |egrep exec\|EXIT)，但默认情况下没有足够的信息perf.data来完全重建perf stat输出。运行时间仅记录为开始和退出的时间戳，并且对于每个事件样本，不记录软件事件，并且使用单个硬件事件（循环；但没有指令、分支、分支未命中）。可以使用硬件计数器的近似值，但并不精确（实际周期约为 820-8.25 亿）：

$ perf report --header |grep Event
# Event count (approx.): 836622729

perf.data可以通过以下方式估计更多事件的非默认记录perf report：

$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses bc
[ perf record: Captured and wrote 0.238 MB perf.data (5164 samples) ]
$ perf report --header |egrep Event\|Samples
# Samples: 1K of event 'cycles'
# Event count (approx.): 834809036
# Samples: 1K of event 'instructions'
# Event count (approx.): 1834083643
# Samples: 1K of event 'branches'
# Event count (approx.): 347750459
# Samples: 1K of event 'branch-misses'
# Event count (approx.): 15382047

可以使用固定周期，但是如果-c选项的值太低，内核可能会限制一些事件（样本的生成频率不应超过每秒 1000-4000 次）：

$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses -c 1000000 bc
$ perf report --header |egrep Event\|Samples
[ perf record: Captured and wrote 0.118 MB perf.data (3029 samples) ]
# Samples: 823  of event 'cycles'
# Event count (approx.): 823000000
# Samples: 1K of event 'instructions'
# Event count (approx.): 1842000000
# Samples: 349  of event 'branches'
# Event count (approx.): 349000000
# Samples: 15  of event 'branch-misses'
# Event count (approx.): 15000000

linux - 可以从 perf.data 文件生成 perf-stat 结果吗？

2 回答 2

Related

Reference