考虑以下 C++ 程序。我希望调用的第一个线程exit将终止程序。这就是我用g++ -g test.cxx -lpthread. 但是,当我链接到 TCMalloc ( g++ -g test.cxx -lpthread -ltcmalloc) 时,它会挂起。为什么?
检查堆栈帧表明,第一个调用的线程exit卡在__unregister_atfork等待某种引用计数变量达到 0。由于它之前获得了互斥锁,所有其他线程都陷入了死锁。我的猜测是 betweek tcmalloc 的 atfork 处理程序和我的代码之间存在某种交互。
使用 gperftools 2.0 在 CentOS 6.4 上测试。
$ cat test.cxx
#include <unistd.h>
#include <iostream>
#include <pthread.h>
#include <stdlib.h>
using namespace std;
static pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
static void* task(void*) {
if (fork() == 0)
return NULL;
pthread_mutex_lock(&m);
exit(0);
}
int main(int argc, char **argv) {
cout << getpid() << endl;
pthread_t t;
for (unsigned i = 0; i < 100; ++i) {
pthread_create(&t, NULL, task, NULL);
}
sleep(9999);
}
$ g++ -g test.cxx -lpthread && $ ./a.out
19515
$ g++ -g test.cxx -lpthread -ltcmalloc && ./a.out
24252
<<< process hangs indefinitely >>>
^C
$ pstack 24252
Thread 101 (Thread 0x7ffaabdf7700 (LWP 24253)):
#0 0x000000328c4f84c4 in __unregister_atfork () from /lib64/libc.so.6
#1 0x00007ffaac02d2c6 in __do_global_dtors_aux () from /usr/lib64/libtcmalloc.so.4
#2 0x0000000000000000 in ?? ()
Thread 100 (Thread 0x7ffaab3f6700 (LWP 24254)):
#0 0x000000328cc0e054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x000000328cc09388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x000000328cc09257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000400abf in task(void*) ()
#4 0x000000328cc07851 in start_thread () from /lib64/libpthread.so.0
#5 0x000000328c4e894d in clone () from /lib64/libc.so.6
<<< the other 98 threads are also deadlocked >>>
Thread 1 (Thread 0x7ffaabdf9740 (LWP 24252)):
#0 0x000000328c4acbcd in nanosleep () from /lib64/libc.so.6
#1 0x000000328c4aca40 in sleep () from /lib64/libc.so.6
#2 0x0000000000400b33 in main ()
编辑:我认为问题可能exit是不是线程安全的。根据POSIX,exit是线程安全的。但是,glibc 文档指出这不是exit线程安全的。