c - 如何修改 C 程序以便 gprof 可以对其进行分析？

Question

当我在我的 C 程序上运行 gprof 时，它说我的程序没有累积时间，并且所有函数调用都显示为 0 时间。但是它确实计算了函数调用。

我如何修改我的程序，以便 gprof 能够计算运行需要多少时间？

score 16 · Accepted Answer

你在编译时指定了 -pg 吗？

http://sourceware.org/binutils/docs-2.20/gprof/Compiling.html#Compiling

编译完成后，您运行程序，然后在二进制文件上运行 gprof。

例如：

测试.c：

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 10000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

编译为cc -pg test.c，然后运行为a.out，然后gprof a.out，给我

粒度：每个样本命中覆盖 4 个字节，时间为 0.03 秒的 1.47%

  % 累计自我自我总计           
 时间 秒 秒 呼叫 ms/呼叫 ms/呼叫名称    
 45.6 0.02 0.02 10000 0.00 0.00 __sys_write [10]
 45.6 0.03 0.02 0 100.00% .mcount (26)
  2.9 0.03 0.00 20000 0.00 0.00 __sfvwrite [6]
  1.5 0.03 0.00 20000 0.00 0.00 内存 [11]
  1.5 0.03 0.00 10000 0.00 0.00 __ultoa [12]
  1.5 0.03 0.00 10000 0.00 0.00 _swrite [9]
  1.5 0.03 0.00 10000 0.00 0.00 vfprintf [2]

你得到什么？

score 4 · Accepted Answer

我尝试运行 Kinopiko 的示例，但我将迭代次数增加了 100 倍。

测试.c：

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 1000000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

然后我拍了 10张堆栈照片（在 VC 下，但你可以使用pstack）。以下是堆栈截图：

9 copies of this stack:
NTDLL! 7c90e514()
KERNEL32! 7c81cbfe()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

1 copy of this stack:
KERNEL32! 7c81cb96()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

如果不明显，这会告诉您：

mainCRTStartup() line 206 + 25 bytes Cost ~100% of the time
main() line 7 + 14 bytes             Cost ~100% of the time
printf() line 62 + 14 bytes          Cost ~100% of the time
_ftbuf() line 171 + 9 bytes          Cost ~100% of the time
_flush() line 162 + 23 bytes         Cost ~100% of the time
_write() line 168 + 57 bytes         Cost ~100% of the time

简而言之，作为第 7 行 printf 的一部分，程序花费了大约 100% 的时间将输出缓冲区刷新到磁盘（或控制台）。

（我所说的“线路成本”的意思是 - 它是在该线路的请求中花费的总时间的一部分，这大致是包含它的样本的一部分。如果可以使该线路不花费时间，例如通过删除它、跳过它，或者将它的工作交给一个无限快的协处理器，这个时间分数就是总时间会减少多少。所以如果可以避免执行这些代码行中的任何一个，时间就会在 95% 到 100% 的范围内收缩。如果你要问“递归呢？”，答案是It Makes No Difference。）

现在，也许您想知道其他一些事情，例如在循环中花费了多少时间。要找出答案，请删除 printf，因为它一直在占用。也许您想知道纯粹用于 CPU 时间而不是系统调用的时间百分比。要做到这一点，只需丢弃任何未以代码结尾的堆栈快照。

我要说明的一点是，如果您正在寻找可以修复的东西以使代码运行得更快，那么gprof为您提供的数据，即使您理解它，也几乎是无用的。相比之下，如果您的某些代码导致花费的挂钟时间比您希望的要多，则堆栈快照将查明它。

score 0 · Accepted Answer

One gotcha with gprof: it doesn't work with code in dynamically-linked libraries. For that, you need to use sprof. See this answer: gprof : How to generate call graph for functions in shared library that is linked to main program

score -3 · Accepted Answer

首先使用编译您的应用程序-g，然后检查您使用的 CPU 计数器。如果您的应用程序运行得非常快，则 gprof 可能会错过所有事件或所需的更少（减少要读取的事件数量）。

实际上分析应该显示你CPU_CLK_UNHALTED或INST_RETIRED没有任何特殊开关的事件。但是有了这样的数据，你只能说你的代码执行得有多好：INST_RETIRED/CPU_CLK_UNHALTED。

尝试使用英特尔 VTune 分析器 - 它可免费使用 30 天并用于教育。

c - 如何修改 C 程序以便 gprof 可以对其进行分析？

4 回答 4

Related

Reference