c++ - gprof 与 cachegrind 配置文件

Question

在尝试优化代码时，我对kcachegrdind和生成的配置文件的差异感到有些困惑gprof。具体来说，如果我使用 gprof（使用-pgswitch 编译等），我有这个：

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 89.62      3.71     3.71   204626     0.02     0.02  objR<true>::R_impl(std::vector<coords_t, std::allocator<coords_t> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&) const
  5.56      3.94     0.23 18018180     0.00     0.00  W2(coords_t const&, coords_t const&)
  3.87      4.10     0.16   200202     0.00     0.00  build_matrix(std::vector<coords_t, std::allocator<coords_t> > const&)
  0.24      4.11     0.01   400406     0.00     0.00  std::vector<double, std::allocator<double> >::vector(std::vector<double, std::allocator<double> > const&)
  0.24      4.12     0.01   100000     0.00     0.00  Wrat(std::vector<coords_t, std::allocator<coords_t> > const&, std::vector<coords_t, std::allocator<coords_t> > const&)
  0.24      4.13     0.01        9     1.11     1.11  std::vector<short, std::allocator<short> >* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<std::vector<short, std::alloca

这似乎表明我不需要费心去寻找任何地方，但::R_impl(...)

同时，如果我在没有-pg开关的情况下编译并运行valgrind --tool=callgrind ./a.out，我有一些相当不同的东西：这是kcachegrind输出的屏幕截图

在此处输入图像描述

如果我正确地解释了这一点，这似乎表明::R_impl(...)只需要大约 50% 的时间，而另一半则用于线性代数（Wrat(...)和eigenvalues底层的 lapack 调用），这在配置文件的下方gprof。

我理解这一点gprof并cachegrind使用不同的技术，如果他们的结果有些不同，我不会打扰。但是在这里，它看起来非常不同，我不知道如何解释这些。有什么想法或建议吗？

score 14 · Accepted Answer

您正在查看错误的列。您必须查看 kcachegrind 输出中的第二列，即名为“self”的列。这是特定子例程仅在不考虑其子例程的情况下花费的时间。第一列有累积时间（它等于主要机器时间的 100%），但信息量不大（在我看来）。

请注意，从 kcachegrind 的输出中您可以看到该过程的总时间为 53.64 秒，而在子例程“R_impl”中花费的时间为 46.72 秒，占总时间的 87%。所以 gprof 和 kcachegrind 几乎完全一致。

score 9 · Accepted Answer

gprof是仪器化分析器，callgrind是采样分析器。使用检测分析器，您会为每个函数的进入和退出增加开销，这可能会导致配置文件出现偏差，尤其是当您有相对较小的函数被多次调用时。采样分析器往往更准确——它们会稍微减慢整个程序的执行速度，但这往往对所有函数具有相同的相对影响。

试试RotateRight对 Zoom 的 30 天免费评估- 我怀疑它会给你一个更符合而callgrind不是符合的配置文件gprof。

c++ - gprof 与 cachegrind 配置文件

2 回答 2

Related

Reference