1

全部,

我正在用 GDB 调试一个 24 线程程序,现在我找到了代码中发生错误的行,但我无法从 GDB 的输出中判断错误是什么。以下代码行导致错误,它只是对映射结构的正常插入。

current_node->children.insert(std::pair<string, ComponentTrieNode*>(comps[j], temp_node));

我使用 GDB 找出错误发生在哪个线程并切换到该线程,该backtrace命令显示堆栈中的函数调用。(最后几行尝试打印函数中某些变量的值,但失败了。)

我应该怎么做才能清楚知道发生了什么错误?

[root@localhost nameComponentEncoding]# gdb NCE_david
GNU gdb (GDB) Fedora (7.2.90.20110429-36.fc15)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david...done.
(gdb) r /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
Starting program: /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffd2bf5700 (LWP 13129)]
[New Thread 0x7fffd23f4700 (LWP 13130)]
[New Thread 0x7fffd1bf3700 (LWP 13131)]
[New Thread 0x7fffd13f2700 (LWP 13132)]
[New Thread 0x7fffd0bf1700 (LWP 13133)]
[New Thread 0x7fffd03f0700 (LWP 13134)]
[New Thread 0x7fffcfbef700 (LWP 13135)]
[New Thread 0x7fffcf3ee700 (LWP 13136)]
[New Thread 0x7fffcebed700 (LWP 13137)]
[New Thread 0x7fffce3ec700 (LWP 13138)]
[New Thread 0x7fffcdbeb700 (LWP 13139)]
[New Thread 0x7fffcd3ea700 (LWP 13140)]
[New Thread 0x7fffccbe9700 (LWP 13141)]
[New Thread 0x7fffcc3e8700 (LWP 13142)]
[New Thread 0x7fffcbbe7700 (LWP 13143)]
[New Thread 0x7fffcb3e6700 (LWP 13144)]
[New Thread 0x7fffcabe5700 (LWP 13145)]
[New Thread 0x7fffca3e4700 (LWP 13146)]
[New Thread 0x7fffc9be3700 (LWP 13147)]
[New Thread 0x7fffc93e2700 (LWP 13148)]
[New Thread 0x7fffc8be1700 (LWP 13149)]
[New Thread 0x7fffc83e0700 (LWP 13150)]
[New Thread 0x7fffc7bdf700 (LWP 13151)]
this is thread 1
this is thread 7
this is thread 14
this is thread 18
this is thread 2
this is thread 19
this is thread 6
this is thread 8
this is thread 24
base: 64312646
this is thread 11
this is thread 5
this is thread 12
this is thread 13
this is thread 3
this is thread 15
this is thread 16
this is thread 17
this is thread 4
this is thread 20
this is thread 21
this is thread 22
this is thread 23
this is thread 9
this is thread 10

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc8be1700 (LWP 13149)]
std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=@0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126         __x->_M_right = __y->_M_left;
(gdb) info threads
  Id   Target Id         Frame
  24   Thread 0x7fffc7bdf700 (LWP 13151) "NCE_david" compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>)
    at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/char_traits.h:257
  (... other 22 threads not listed)
  2    Thread 0x7fffd2bf5700 (LWP 13129) "NCE_david" compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>)
    at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/char_traits.h:257
  1    Thread 0x7ffff7fe57a0 (LWP 13126) "NCE_david" strtok () at ../sysdeps/x86_64/strtok.S:76
(gdb) thread 22
[Switching to thread 22 (Thread 0x7fffc8be1700 (LWP 13149))]
#0  std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=@0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126         __x->_M_right = __y->_M_left;

(gdb) bt
#0  std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=@0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
#1  0x0000003cdd26e848 in std::_Rb_tree_insert_and_rebalance (__insert_left=<optimized out>, __x=0x7fffc0005ba0, __p=<optimized out>, __header=...)
    at ../../../../libstdc++-v3/src/tree.cc:266
#2  0x00000000004029ca in std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*>, std::_Select1st<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> > >::_M_insert_ (this=0x608108, __x=<optimized out>, __p=0x16cd3e30, __v=...)
    at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_pair.h:87
#3  0x0000000000402b7d in std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*>, std::_Select1st<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> > >::_M_insert_unique (this=0x608108, __v=...)
    at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_tree.h:1281
#4  0x000000000040444c in insert (__x=..., this=0x608108) at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_map.h:518
#5  ComponentTrie::add_prefix (this=0x7fffffffe2e0, prefix_input=<optimized out>, port=10) at ComponentTrie_david.cpp:112
#6  0x0000000000401c3b in main._omp_fn.0 () at NameComponentEncoding_david.cpp:277
#7  0x0000003cd2607fea in gomp_thread_start (xdata=<optimized out>) at ../../../libgomp/team.c:115
#8  0x0000003cd0607cd1 in start_thread (arg=0x7fffc8be1700) at pthread_create.c:305
#9  0x0000003cd02dfd3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

(gdb) p 'ComponentTrie::add_prefix(char*, int)'::comps[j]
No symbol "comps" in specified context.
(gdb) p 'ComponentTrie::add_prefix(char*, int)'::prefix
No symbol "prefix" in specified context.

编辑:我已经运行了代码valgrind --tool=memcheck,结果如下。

[root@localhost nameComponentEncoding]# valgrind --tool=memcheck ./NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
(... many lines omitted)
==13261==
==13261== Thread 11:
==13261== Invalid read of size 1
==13261==    at 0x3CD02849BC: strtok (strtok.S:141)
==13261==    by 0x40426A: ComponentTrie::add_prefix(char*, int) (ComponentTrie_david.cpp:99)
==13261==    by 0x40242C: main._omp_fn.0 (NameComponentEncoding_david.cpp:531)
==13261==    by 0x3CD2607FE9: gomp_thread_start (team.c:115)
==13261==    by 0x3CD0607CD0: start_thread (pthread_create.c:305)
==13261==    by 0x3CD02DFD3C: clone (clone.S:115)
==13261==  Address 0x234422c02 is not stack'd, malloc'd or (recently) free'd
==13261==
==13261== Invalid read of size 1
==13261==    at 0x3CD02849EC: strtok (strtok.S:167)
==13261==    by 0x40426A: ComponentTrie::add_prefix(char*, int) (ComponentTrie_david.cpp:99)
==13261==    by 0x40242C: main._omp_fn.0 (NameComponentEncoding_david.cpp:531)
==13261==    by 0x3CD2607FE9: gomp_thread_start (team.c:115)
==13261==    by 0x3CD0607CD0: start_thread (pthread_create.c:305)
==13261==    by 0x3CD02DFD3C: clone (clone.S:115)
==13261==  Address 0x234422c02 is not stack'd, malloc'd or (recently) free'd
==13261==
Insertion and lookup cost time(us): 994669532   67108864        14.821731       0.067469
component number:4849478, state number: 2545847
Parallel threads:24
==13261==
==13261== HEAP SUMMARY:
==13261==     in use at exit: 4,239,081,584 bytes in 76,746,193 blocks
==13261==   total heap usage: 80,050,114 allocs, 3,303,921 frees, 4,323,622,103 bytes allocated
==13261==
==13261== LEAK SUMMARY:
==13261==    definitely lost: 0 bytes in 0 blocks
==13261==    indirectly lost: 0 bytes in 0 blocks
==13261==      possibly lost: 4,111,951,106 bytes in 74,746,429 blocks
==13261==    still reachable: 127,130,478 bytes in 1,999,764 blocks
==13261==         suppressed: 0 bytes in 0 blocks
==13261== Rerun with --leak-check=full to see details of leaked memory
==13261==
==13261== For counts of detected and suppressed errors, rerun with: -v
==13261== Use --track-origins=yes to see where uninitialised values come from
==13261== ERROR SUMMARY: 45 errors from 30 contexts (suppressed: 6 from 6)
4

1 回答 1

4

我们知道程序在这一行发生了段错误:

current_node->children.insert(std::pair<string, ComponentTrieNode*>(comps[j], temp_node));

从堆栈跟踪中,我们知道段错误发生在红黑树实现的深处std::map

#0  std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=@0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126         __x->_M_right = __y->_M_left;

这意味着:

  1. 段错误可能由以下原因引起:
    1. 评估__x->_M_right
    2. 评估__y->_M_left
    3. 将右侧存储到左侧__x->_M_right = __y->_M_left
  2. std::map::insert()被调用意味着段错误不是在构建调用的参数时引起的。特别comps[j]是没有越界。

这使我认为此时您的堆已经被先前的内存操作错误破坏了,并且崩溃std::map::insert()是一种症状而不是原因。

在 Valgrind memcheck 工具下运行你的程序:

$ valgrind --tool=memcheck /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt

然后仔细阅读 Valgrind 的输出,找出程序中的第一个内存错误。

Valgrind 是作为虚拟 CPU 实现的,因此您的程序会减慢约 30 倍。这很耗时,但应该能让您在解决问题方面取得进展。

除了 Valgrind,您可能还想尝试为容器启用调试模式libstdc++

要使用 libstdc++ 调试模式,请使用编译器标志 -D_GLIBCXX_DEBUG 编译您的应用程序。请注意,此标志会更改标准类模板(例如 std::vector)的大小和行为,因此如果在两个翻译单元之间未传递容器的实例化,则只能链接使用调试模式编译的代码和不使用调试模式编译的代码.

如果您的程序不使用外部库,那么使用添加到CXXFLAGS中的-D_GLIBCXX_DEBUG来重建整个程序应该可以工作。否则,您需要知道 C++ 容器是否在使用和不使用调试标志编译的组件之间传递。Makefile

Valgrind 日志审查

我很惊讶您strtok()在多线程程序中使用。ComponentTrie::add_prefix()是否永远不会同时从两个线程调用?strtok()通过检查如何在ComponentTrie_david.cpp:99上使用来修复无效读取时,您可能还想strtok()strtok_r()替换。

并发访问 STL 容器

标准 C++ 容器被明确记录为不进行线程同步:

当其中一个或多个访问修改状态时,用户代码必须防止访问任何特定库对象状态的并发函数调用。对象将通过在其上调用非常量成员函数或将其作为非常量参数传递给库函数来进行修改。不会通过在对象上调用 const 成员函数或将其作为指针或对 const 的引用传递给函数来修改对象。通常,应用程序程序员可以根据函数调用中引用的对象以及对象是作为常量还是非常量访问来推断必须持有哪些对象锁。

(这来自 GNUlibstdc++文档,但 C++11 标准基本上指定了相同的行为)std::map和其他容器的并发修改是一个严重的错误,并且可能是导致崩溃的罪魁祸首。用自己的容器保护每个容器pthread_mutex_t或使用 OpenMP 同步机制。

于 2013-05-26T14:48:51.860 回答