11

I'm currently trying to track down some phantom I/O in a PostgreSQL build I'm testing. It's a multi-process server and it isn't simple to associate disk I/O back to a particular back-end and query.

I thought Linux's perf tool would be ideal for this, but I'm struggling to capture block I/O performance counter metrics and associate them with user-space activity.

It's easy to record block I/O requests and completions with, eg:

sudo perf record -g -T -u postgres -e 'block:block_rq_*'

and the user-space pid is recorded, but there's no kernel or user-space stack captured, or ability to snapshot bits of the user-space process's heap (say, query text) etc. So while you have the pid, you don't know what the process was doing at that point. Just perf script output like:

postgres  7462 [002] 301125.113632: block:block_rq_issue: 8,0 W 0 () 208078848 + 1024 [postgres]

If I add the -g flag to perf record it'll take snapshots of the kernel stack, but doesn't capture user-space state for perf events captured in the kernel. The user-space stack only goes up to the entry-point from userspace, like LWLockRelease, LWLockAcquire, memcpy (mmap'd IO), __GI___libc_write, etc.

So. Any tips? Being able to capture a snapshot of the user-space stack in response to kernel events would be ideal.

I'm on Fedora 19, 3.11.3-201.fc19.x86_64, Schrödinger’s Cat, with perf version 3.10.9-200.fc19.x86_64.

4

1 回答 1

16

好的,看起来有几个部分:

  • 我在 x86_64 上,大多数发行版-fomit-frame-pointer默认使用该版本构建,并且perf在没有帧指针的情况下无法跟随堆栈;

  • .... 除非它是支持构建的较新版本libunwind,在这种情况下它支持perf record -g dwarf.

看:

我在 Fedora 18 上,但同样的问题也适用。因此,如果您正在分析您正在处理的代码(可能在 Stack Overflow 上),请使用-fno-omit-frame-pointer和重建-ggdb

我开始重建perf是因为我希望能够与库存 RPM 进行比较:

  • sudo yum build-dep perf
  • sudo yum install yum-utils rpmdevtools libunwind-devel
  • yumdownloader --source perf或下载适当的kernel-.....src.rpmsrpm
  • rpmdev-setuptree
  • rpm -Uvh kernel-*.src.rpm
  • cd $HOME/rpmbuild/SPECS
  • rpmbuild -bp --target=$(uname -m) kernel.spec

此时,您可以perf根据需要构建一个新的:

  • cd $HOME/rpmbuild/BUILD/kernel-*/linux-*/tools/perf
  • make

...我做了并测试了perf如果使用可用的 libunwind 构建,更新后确实捕获了一个有用的堆栈。

您还可以构建一个新的 rpm:

  • 编辑 kernel.spec,取消注释该行%define buildid ...,将 buildid 更改为.perfunwind. 注意%define不是% define

  • 在同一个规范文件中,找到:

    %global perf_make \
    make %{?_smp_mflags} -C tools/perf -s V=1 WERROR=0 NO_LIBUNWIND=1 HAVE_CPLUS_DEMANGLE=1 NO_GTK2=1 NO_LIBNUMA=1 NO_STRLCPY=1 prefix=%{_prefix}
    

    并删除NO_LIBUNWIND=1

  • rpmbuild -bb --without up --without mp --without pae --without debug --without doc --without headers --without debuginfo --without bootwrapper --without with_vdso_install --with perf kernel.specperf在不构建整个内核的情况下生成新的 RPM。或者,如果您愿意,可以省略您想要--without的内核风格,在这种情况下,您还需要构建头文件、调试信息等。

  • sudo rpm -Uvh $HOME/rpmbuild/RPMS/x86_64/perf-*.fc19.x86_64.rpm

请参阅构建自定义内核的 fedora 项目指南

我已经向 Fedora 报告了这个问题;他们不应该使用NO_LIBUNWIND=1. 请参阅错误 1025603

一旦你有一个重建perf,你可以perf record -g dwarf用来获得完整的堆栈。

于 2013-11-01T03:44:43.923 回答