“perf”的相关标签问题_Stack Overflow中文网

0 投票

1 回答

1396 浏览

cpu-architecture - Units of perf stat statistics

I'm using perf stat for some purposes and to better understand the working of the tool , I wrote a program that copies a file's contents into another . I ran the program on a 750MB file and the stats are below

what is the units of each number . what I mean is . Is it bits/bytes/ or something else . Thanks in advance.

2014-04-11T21:51:42.697

0 投票

1 回答

169 浏览

linux - 查找错误预测分支的源位置

我试图在一个函数中找到最常错误预测分支的位置。我试了一下perf，如下：

perf record ./a.out

a.out-ggdb -fno-omit-frame-pointer按照手册的建议，使用 options 编译。

我怎样才能找到这些景点？

linux branch performancecounter perf

2014-04-12T17:45:32.393

0 投票

2 回答

5429 浏览

linux - 如何使用 linux `perf` 工具生成“Off-CPU”配置文件

Brendan D. Gregg（DTrace 书的作者）有一个有趣的分析变体：“Off-CPU”分析（和Off-CPU Flame Graph；幻灯片 2013，p112-137）查看线程或应用程序被阻塞的位置（是未由 CPU 执行，但正在等待 I/O、页面错误处理程序或因 CPU 资源不足而取消调度）：

这一次揭示了哪些代码路径在 CPU 关闭时被阻塞和等待，以及确切的等待时间。这与传统的分析不同，后者通常以给定的时间间隔对线程的活动进行采样，并且（通常）仅在线程在 CPU 上执行工作时才检查它们。

他还可以将 Off-CPU 配置文件数据和 On-CPU 配置文件结合在一起：http ://www.brendangregg.com/FlameGraphs/hotcoldflamegraphs.html

Gregg 给出的示例是使用制作的dtrace，这在 Linux 操作系统中通常不可用。但也有一些类似的工具（ktap、systemtap、perf），而且perf我认为它们拥有最广泛的安装基础。通常perf生成 On-CPU 配置文件（哪些功能在 CPU 上执行得更频繁）。

如何将 Gregg 的 Off-CPU 示例转换为perfLinux 中的分析工具？

PS：在来自 LISA13, p124的幻灯片中有指向 Off-CPU 火焰图的 systemtap 变体的链接：“张益春创建了这些，并一直在 Linux 上使用它们与 SystemTap 来收集配置文件数据。请参阅： • http://agentzh .org/misc/slides/off-cpu-flame-graphs.pdf “ ”（2013 年 8 月 23 日的 CloudFlare 啤酒会议）

linux profiling wait perf

2014-04-16T02:51:38.680

0 投票

4 回答

34027 浏览

macos - Install "perf" on Mac

I need the "perf" utility to monitor the program on my Mac. I know linux comes with it, but is it available on Mac?

I am working on a OSX 10.9 Mavericks and tried "port search" for perf or linux-tools, but I couldn't get any results.

macos profiling osx-mavericks performancecounter perf

2014-04-21T15:45:57.707

0 投票

1 回答

4199 浏览

c - 使用 pin、perf 和 valgrind 计算二进制文件执行的指令数

我是用于动态二进制检测的pin工具的新手。我试图使用 pin 工具 API 编写一些简单的客户端程序。一个这样的简单客户端正在计算二进制文件的执行指令的数量，这是作为 pin 的示例之一给出的。
我用 C 写了一个非常基本的程序，

并使用 gcc 编译器编译。当我使用 pin 工具计算用于上述 C 程序二进制文件的指令时，它给了我答案96072
当我使用valgrind执行相同的任务时，它给了我97487的答案，几乎等于前一个。但是当我使用perf时，答案是421,256 各种工具之间存在这种差异的原因是什么？
为了找到更多细节，我将 C 程序编译成 x86 程序集，它包含大约 20-30 行汇编指令，但是当我使用objdump要反汇编二进制文件，会产生 200-300 行汇编指令。我也无法弄清楚这种差异的原因。我正在运行带有 Linux 内核版本 3.8.0-39 的 64 位 Ubuntu 12.04。提前致谢。

c assembly valgrind x86-64 perf

2014-04-25T09:51:40.870

0 投票

3 回答

5873 浏览

c - 从 ac 程序测量页面错误

我正在比较一些我从/向内存读取/写入的系统调用。是否定义了任何 API 来测量页面错误（页面输入/输出）C？

我找到了这个库 libperfstat.a但它是AIX用于 linux 的，我找不到任何东西。

编辑： 我知道 linux 中的time&perf-stat命令，只是探索是否有任何东西可供我在C程序中使用。

c linux perf page-fault

2014-04-25T20:49:19.603

0 投票

1 回答

1630 浏览

c - read() system call page fault doesn't depend on file size

I am reading different sized files (1KB - 1GB) using read() in C. But everytime I check the page-faults using perf-stat, it always gives me the same (almost) values.

My machine: (fedora 18 on a Virtual Machine, RAM - 1GB, Disk space - 20 GB)

My code:

Perf-stat output: (shows file size, time to read the file and the # of page faults)

Questions:
1. How can the page-faults for a file read() of size of 1KB & 1GB be same ? Since I am reading the data too (code line #84), I am making sure the data is being actually read.
2. The only reason that I can think of that it doesn't encounter that many page-faults is because the data is already present in the main memory. If this is the case, how can I flush it so that when I run my code it actually shows me the true page-faults ? Otherwise I can never measure the true performance of read().

Edit1:
echo 3 > /proc/sys/vm/drop_caches doesn't help, the output still remains the same.

Edit2: For mmap, the output of perf-stat is:

c linux perf page-fault

2014-04-26T23:34:14.783

0 投票

0 回答

529 浏览

linux - “cpsie”arm指令案例TLB会丢失吗？

当我分析我的程序时，我发现“_raw_spin_unlock_irq”系统调用会导致 ARM Cortex A15 板上的大量 iTLB 未命中。在我仔细检查了汇编代码后，我发现“cpsie”指令可能是原因之一。因此，我编写了一个短代码来验证我的假设。

以下是我的代码：

然后我使用 perf 工具检查 iTLB 未命中，它报告：

89172 dTLB 负载未命中

5694 dTLB 存储未命中

43248 iTLB 加载未命中

去掉“cpsie i”指令后，结果为：

23453 dTLB 负载未命中

1453 dTLB 存储未命中

12035 iTLB 加载未命中

结果表明，“cpsie i”增加了 4 倍 iTLB 未命中。我使用 perf 报告对二进制代码进行了注释，69.5% 的 iTLB 未命中发生在“cpsie i”指令之后。

我很困惑为什么在“cpsie i”指令之后发生了很多 iTLB 未命中？有什么办法可以防止吗？谢谢！

linux arm kernel tlb perf

2014-05-02T09:10:09.043

0 投票

1 回答

82 浏览

linux - 根据性能计数器将程序分类为计算密集型程序

我试图将少数并行程序归类为计算/内存/数据密集型。我可以根据从perf等性能计数器获得的值对它们进行分类吗？这个命令给出了几个值，比如我认为可以用来知道程序是否需要频繁访问内存的页面错误数，否则。

这种方法是正确的和可能的方法。如果没有，有人可以指导我将程序分类为各自的类别。

干杯，克里斯

linux performance parallel-processing profiling perf

2014-05-08T17:16:17.630

0 投票

0 回答

145 浏览

perf - 如何从 perf.data 中拆分/过滤事件？

问题：有没有办法从具有多个事件的 perf.data 中提取特定事件的样本？

语境

我有两个事件的记录样本，我通过运行类似

据我所见，其他 perf 命令（例如perf diffandperf script没有事件过滤器标志）。因此，将我的perf.data拆分为cycles.perf.data和instructions.perf.data会很有用。

谢谢！

perf

2014-05-12T01:27:33.020

问题标签 [perf]

Reference