c++ - Valgrind 堆栈完全错过了一个函数

Question

我有两个 c 文件：

交流

void main(){
    ...
    getvtable()->function();
}

vtable 指向位于 bc 中的函数：

void function(){
    malloc(42);
}

现在，如果我在 valgrind 中跟踪程序，我会得到以下信息：

==29994== 4,155 bytes in 831 blocks are definitely lost in loss record 26 of 28
==29994==    at 0x402CB7A: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==29994==    by 0x40A24D2: (below main) (libc-start.c:226)

所以对函数的调用在堆栈上完全被省略了！这怎么可能？如果我使用 GDB，则会显示一个正确的堆栈，包括“函数”。

包括调试符号，Linux，32 位。

更新：

回答第一个问题，我在调试 valgrind 的 GDB 服务器时得到以下输出。断点不会到来，而当我直接使用 GDB 调试时会出现断点。

stasik@gemini:~$ gdb -q
(gdb) set confirm off
(gdb) target remote | vgdb
Remote debugging using | vgdb
relaying data between gdb and process 11665
[Switching to Thread 11665]
0x040011d0 in ?? ()
(gdb) file /home/stasik/leak.so
Reading symbols from /home/stasik/leak.so...done.
(gdb) break function
Breakpoint 1 at 0x110c: file ../../source/leakclass.c, line 32.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>silent
>end
(gdb) continue
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) source thread-frames.py
Stack level 0, frame at 0x42348a0:
 eip = 0x404efcb; saved eip 0x4f2f544c
 called by frame at 0x42348a4
 Arglist at 0x4234898, args:
 Locals at 0x4234898, Previous frame's sp is 0x42348a0
 Saved registers:
  ebp at 0x4234898, eip at 0x423489c
Stack level 1, frame at 0x42348a4:
 eip = 0x4f2f544c; saved eip 0x6e492056
 called by frame at 0x42348a8, caller of frame at 0x42348a0
 Arglist at 0x423489c, args:
 Locals at 0x423489c, Previous frame's sp is 0x42348a4
 Saved registers:
  eip at 0x42348a0
Stack level 2, frame at 0x42348a8:
 eip = 0x6e492056; saved eip 0x205d6f66
 called by frame at 0x42348ac, caller of frame at 0x42348a4
 Arglist at 0x42348a0, args:
 Locals at 0x42348a0, Previous frame's sp is 0x42348a8
 Saved registers:
  eip at 0x42348a4
Stack level 3, frame at 0x42348ac:
 eip = 0x205d6f66; saved eip 0x61746144
---Type <return> to continue, or q <return> to quit---
 called by frame at 0x42348b0, caller of frame at 0x42348a8
 Arglist at 0x42348a4, args:
 Locals at 0x42348a4, Previous frame's sp is 0x42348ac
 Saved registers:
  eip at 0x42348a8
Stack level 4, frame at 0x42348b0:
 eip = 0x61746144; saved eip 0x65736162
 called by frame at 0x42348b4, caller of frame at 0x42348ac
 Arglist at 0x42348a8, args:
 Locals at 0x42348a8, Previous frame's sp is 0x42348b0
 Saved registers:
  eip at 0x42348ac
Stack level 5, frame at 0x42348b4:
 eip = 0x65736162; saved eip 0x70616d20
 called by frame at 0x42348b8, caller of frame at 0x42348b0
 Arglist at 0x42348ac, args:
 Locals at 0x42348ac, Previous frame's sp is 0x42348b4
 Saved registers:
  eip at 0x42348b0
Stack level 6, frame at 0x42348b8:
 eip = 0x70616d20; saved eip 0x2e646570
 called by frame at 0x42348bc, caller of frame at 0x42348b4
 Arglist at 0x42348b0, args:
---Type <return> to continue, or q <return> to quit---
 Locals at 0x42348b0, Previous frame's sp is 0x42348b8
 Saved registers:
  eip at 0x42348b4
Stack level 7, frame at 0x42348bc:
 eip = 0x2e646570; saved eip 0x0
 called by frame at 0x42348c0, caller of frame at 0x42348b8
 Arglist at 0x42348b4, args:
 Locals at 0x42348b4, Previous frame's sp is 0x42348bc
 Saved registers:
  eip at 0x42348b8
Stack level 8, frame at 0x42348c0:
 eip = 0x0; saved eip 0x0
 caller of frame at 0x42348bc
 Arglist at 0x42348b8, args:
 Locals at 0x42348b8, Previous frame's sp is 0x42348c0
 Saved registers:
  eip at 0x42348bc
(gdb) continue
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) continue
Continuing.

score 5 · Accepted Answer

我看到两个可能的原因：

Valgrind 使用与 GDB 不同的堆栈展开方法
在两种环境下运行程序时，地址空间布局是不同的，并且您只会在 Valgrind 下遇到堆栈损坏。

我们可以通过使用 Valgrind 的内置 gdbserver 获得更多洞察力。

将此 Python 片段保存到thread-frames.py

import gdb

f = gdb.newest_frame()
while f is not None:
    f.select()
    gdb.execute('info frame')
    f = f.older()

t.gdb

set confirm off
file MY-PROGRAM
break function
commands
silent
end
run
source thread-frames.py
quit

v.gdb

set confirm off
target remote | vgdb
file MY-PROGRAM
break function
commands
silent
end
continue
source thread-frames.py
quit

（根据需要更改MY-PROGRAM，在上面的脚本和下面的命令中起作用）

获取有关 GDB 下的堆栈帧的详细信息：

$ gdb -q -x t.gdb
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbffff2f0:
 eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
 called by frame at 0xbffff310
 source language c.
 Arglist at 0xbffff2e8, args: 
 Locals at 0xbffff2e8, Previous frame's sp is 0xbffff2f0
 Saved registers:
  ebp at 0xbffff2e8, eip at 0xbffff2ec
Stack level 1, frame at 0xbffff310:
 eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0xb7e33963
 caller of frame at 0xbffff2f0
 source language c.
 Arglist at 0xbffff2f8, args: 
 Locals at 0xbffff2f8, Previous frame's sp is 0xbffff310
 Saved registers:
  ebp at 0xbffff2f8, eip at 0xbffff30c

在 Valgrind 下获取相同的数据：

$ valgrind --vgdb=full --vgdb-error=0 ./MY-PROGRAM

在另一个外壳中：

$ gdb -q -x v.gdb
relaying data between gdb and process 574
0x04001020 in ?? ()
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbe88e2c0:
 eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
 called by frame at 0xbe88e2e0
 source language c.
 Arglist at 0xbe88e2b8, args: 
 Locals at 0xbe88e2b8, Previous frame's sp is 0xbe88e2c0
 Saved registers:
  ebp at 0xbe88e2b8, eip at 0xbe88e2bc
Stack level 1, frame at 0xbe88e2e0:
 eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0x4051963
 caller of frame at 0xbe88e2c0
 source language c.
 Arglist at 0xbe88e2c8, args: 
 Locals at 0xbe88e2c8, Previous frame's sp is 0xbe88e2e0
 Saved registers:
  ebp at 0xbe88e2c8, eip at 0xbe88e2dc

如果 GDB 可以在连接到“ valgrind --gdb ”时成功展开堆栈，那么这是 Valgrind 的堆栈展开算法的问题。您可以仔细检查“信息帧”输出中的内联和尾调用帧或可能导致 Valgrind 关闭的其他原因。否则可能是堆栈损坏。

score 5 · Accepted Answer

好的，使用显式 -O0 编译所有 .so 部分和主程序似乎可以解决问题。似乎加载 .so 的“核心”程序的一些优化（所以总是未优化编译）正在破坏堆栈。

score 2 · Accepted Answer

这是尾调用优化的实际应用。

函数function调用malloc是它做的最后一件事。编译器看到这一点并在它调用function 之前终止堆栈帧malloc。优点是当malloc返回时它直接返回到调用的任何函数function。即，它避免malloc返回function只是为了击中另一个返回指令。

在这种情况下，优化防止了不必要的跳转，并使堆栈的使用效率稍微提高了一点，这很好，但在递归尾调用的情况下，这种优化是一个巨大的胜利，因为它将递归变成了更像迭代的东西。

正如您已经发现的那样，禁用优化会使调试变得更加容易。如果您想调试优化的代码（也许是为了性能测试），那么，正如@Zang MingJie 已经说过的，您可以使用-fno-optimize-sibling-calls.

c++ - Valgrind 堆栈完全错过了一个函数

3 回答 3

Related

Reference