1

我一直在尝试调试导致分段错误的服务问题。我无权访问生产服务器,因此我在服务中处理了 SIGSEGV 信号并在日志文件中打印了堆栈跟踪。以下是服务崩溃时的堆栈跟踪

0# 0x00000000005054DA in ./afiniti_lookup
1# 0x00007F2BBB74A400 in /usr/lib64/libc.so.6
2# 0x00007F2BBB86F9BD in /usr/lib64/libc.so.6
3# 0x000000000041BB52 in ./afiniti_lookup
4# std::string::_M_move(char*, char const*, unsigned long) in ./afiniti_lookup
5# std::string::_M_mutate(unsigned long, unsigned long, unsigned long) in ./afiniti_lookup
6# std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) in ./afiniti_lookup
7# std::string::assign(char const*, unsigned long) in ./afiniti_lookup
8# std::string::assign(char const*) in ./afiniti_lookup
9# std::string::operator=(char const*) in ./afiniti_lookup
10# 0x000000000061E8E9 in ./afiniti_lookup
11# 0x0000000000620200 in ./afiniti_lookup
12# 0x000000000055B586 in ./afiniti_lookup
13# 0x00000000004F2BAC in ./afiniti_lookup
14# 0x00000000004F0715 in ./afiniti_lookup
15# 0x000000000051CDBF in ./afiniti_lookup
16# 0x0000000000529869 in ./afiniti_lookup
17# 0x0000000000464968 in ./afiniti_lookup
18# 0x0000000000461369 in ./afiniti_lookup
19# 0x0000000000460D6E in ./afiniti_lookup
20# 0x0000000000460086 in ./afiniti_lookup
21# 0x000000000045FD36 in ./afiniti_lookup
22# 0x000000000046CAB4 in ./afiniti_lookup
23# 0x000000000046B4F6 in ./afiniti_lookup
24# 0x000000000046FF13 in ./afiniti_lookup
25# 0x000000000046FE65 in ./afiniti_lookup
26# 0x000000000046FCDA in ./afiniti_lookup
27# 0x00007F2BBCE5038F in /opt/lib64/libcpprest.so.2.10
28# 0x00007F2BBEDCAEA5 in /usr/lib64/libpthread.so.0\n29# clone in /usr/lib64/libc.so.6

但是,此跟踪没有多大用处,因为我无法在代码中查明问题发生的位置。有人可以帮助我更好地理解和检查这个堆栈跟踪吗?

4

1 回答 1

1

有人可以帮助我更好地理解和检查这个堆栈跟踪吗?

看起来您在生产中有一个部分剥离的可执行文件。

应该有一个未剥离的副本(通过链接您的可执行文件生成)。如果你不这样做,你需要改变你的方式,并在你之前保存一份副本strip

使用未剥离的副本,您可以像这样理解堆栈跟踪:

addr2line -fe afiniti_lookup.unstripped 0x61E8E9 0x620200 0x55B586 ...

这是示例输出:

cat foo.c

int foo() { int *ip = 0; return *ip; }
int bar() { return foo(); }
int zoo() { return bar(); }
int main() { return zoo(); }

用调试信息编译它:(gcc -g foo.c产生a.out)。
剥离“生产”的二进制文件:strip --strip=all a.out -o b.out.

运行b.outGDB 下模拟生产堆栈跟踪:

(gdb) run
Starting program: /tmp/b.out

Program received signal SIGSEGV, Segmentation fault.
0x0000000000401112 in ?? ()
(gdb) bt
#0  0x0000000000401112 in ?? ()
#1  0x0000000000401124 in ?? ()
#2  0x0000000000401134 in ?? ()
#3  0x0000000000401144 in ?? ()
#4  0x00007ffff7dfbcca in __libc_start_main (main=0x401136, argc=1, argv=0x7fffffffdc98, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdc88) at ../csu/libc-start.c:308
#5  0x000000000040104a in ?? ()

现在使用addr2line未剥离的二进制文件来理解上面的堆栈跟踪:

addr2line -fe a.out 0x0000000000401112 0x0000000000401124 0x0000000000401134 0x0000000000401144
foo
/tmp/foo.c:1
bar
/tmp/foo.c:2
zoo
/tmp/foo.c:3
main
/tmp/foo.c:4

PS 对于实际生产使用,理想情况下您将使用 编译您的二进制文件gcc -O2 -g ...,这样您就有完整的调试信息,然后strip是二进制文件(但保留一份完整的调试副本)。通过这种方式,您可以相当轻松地从生产环境中调试核心转储,并访问函数、文件、行和变量。

于 2020-10-16T03:58:38.580 回答