我在 Python 中有一个非常简单的脚本,但由于某种原因,在运行大量数据时出现以下错误:
*** glibc detected *** python: double free or corruption (out): 0x00002af5a00cc010 ***
当一个人试图释放已经被释放的内存时,我已经习惯了 C 或 C++ 中出现的这些错误。但是,根据我对 Python 的理解(尤其是我编写代码的方式),我真的不明白为什么会发生这种情况。
这是代码:
#!/usr/bin/python -tt
import sys, commands, string
import numpy as np
import scipy.io as io
from time import clock
W = io.loadmat(sys.argv[1])['W']
size = W.shape[0]
numlabels = int(sys.argv[2])
Q = np.zeros((size, numlabels), dtype=np.double)
P = np.zeros((size, numlabels), dtype=np.double)
Q += 1.0 / Q.shape[1]
nu = 0.001
mu = 0.01
start = clock()
mat = -nu + mu*(W*(np.log(Q)-1))
end = clock()
print >> sys.stderr, "Time taken to compute matrix: %.2f seconds"%(end-start)
有人可能会问,为什么要声明一个 P 和一个 Q numpy 数组?我这样做只是为了反映实际情况(因为这段代码只是我实际所做的一部分,我需要一个 P 矩阵并事先声明它)。
我可以使用 192GB 的机器,所以我在一个非常大的 SciPy 稀疏矩阵(220 万乘 220 万,但非常稀疏,这不是问题)上对此进行了测试。主内存由 Q、P 和矩阵矩阵占用,因为它们都是 220 万乘以 2000 矩阵(大小 = 220 万,numlabels = 2000)。峰值内存高达 131GB,可轻松放入内存。在计算 mat 矩阵时,我收到 glibc 错误,并且我的进程自动进入睡眠 (S) 状态,而不会释放它占用的 131GB。
考虑到奇怪的(对于 Python)错误(我没有明确地取消分配任何东西),而且这对于较小的矩阵大小(到 2000 年大约 150 万)非常有效,我真的不知道从哪里开始调试它。
作为起点,我在运行前设置了“ulimit -s unlimited”,但无济于事。
欢迎任何帮助或洞察 numpy 在大量数据中的行为。
请注意,这不是内存不足错误 - 我有 196GB,我的进程达到了大约 131GB 并在此停留了一段时间,然后才给出以下错误。
更新:2013 年 2 月 16 日(太平洋标准时间下午 1:10):
根据建议,我使用 GDB 运行 Python。有趣的是,在一次 GDB 运行中,我忘记将堆栈大小限制设置为“无限”,并得到以下输出:
*** glibc detected *** /usr/bin/python: munmap_chunk(): invalid pointer: 0x00007fe7508a9010 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x733b6)[0x7ffff6ec23b6]
/usr/lib64/python2.7/site-packages/numpy/core/multiarray.so(+0x4a496)[0x7ffff69fc496]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4e67)[0x7ffff7af48c7]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x309)[0x7ffff7af6c49]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7ffff7b25592]
/usr/lib64/libpython2.7.so.1.0(+0xfcc61)[0x7ffff7b33c61]
/usr/lib64/libpython2.7.so.1.0(PyRun_FileExFlags+0x84)[0x7ffff7b34074]
/usr/lib64/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0x189)[0x7ffff7b347c9]
/usr/lib64/libpython2.7.so.1.0(Py_Main+0x36c)[0x7ffff7b3e1bc]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7ffff6e6dbfd]
/usr/bin/python[0x4006e9]
======= Memory map: ========
00400000-00401000 r-xp 00000000 09:01 50336181 /usr/bin/python2.7
00600000-00601000 r--p 00000000 09:01 50336181 /usr/bin/python2.7
00601000-00602000 rw-p 00001000 09:01 50336181 /usr/bin/python2.7
00602000-00e5f000 rw-p 00000000 00:00 0 [heap]
7fdf2584c000-7ffff0a66000 rw-p 00000000 00:00 0
7ffff0a66000-7ffff0a6b000 r-xp 00000000 09:01 50333916 /usr/lib64/python2.7/lib-dynload/mmap.so
7ffff0a6b000-7ffff0c6a000 ---p 00005000 09:01 50333916 /usr/lib64/python2.7/lib-dynload/mmap.so
7ffff0c6a000-7ffff0c6b000 r--p 00004000 09:01 50333916 /usr/lib64/python2.7/lib-dynload/mmap.so
7ffff0c6b000-7ffff0c6c000 rw-p 00005000 09:01 50333916 /usr/lib64/python2.7/lib-dynload/mmap.so
7ffff0c6c000-7ffff0c77000 r-xp 00000000 00:12 54138483 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/streams.so
7ffff0c77000-7ffff0e76000 ---p 0000b000 00:12 54138483 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/streams.so
7ffff0e76000-7ffff0e77000 r--p 0000a000 00:12 54138483 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/streams.so
7ffff0e77000-7ffff0e78000 rw-p 0000b000 00:12 54138483 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/streams.so
7ffff0e78000-7ffff0e79000 rw-p 00000000 00:00 0
7ffff0e79000-7ffff0e9b000 r-xp 00000000 00:12 54138481 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/mio5_utils.so
7ffff0e9b000-7ffff109a000 ---p 00022000 00:12 54138481 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/mio5_utils.so
7ffff109a000-7ffff109b000 r--p 00021000 00:12 54138481 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/mio5_utils.so
7ffff109b000-7ffff109f000 rw-p 00022000 00:12 54138481 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/mio5_utils.so
7ffff109f000-7ffff10a0000 rw-p 00000000 00:00 0
7ffff10a0000-7ffff10a5000 r-xp 00000000 09:01 50333895 /usr/lib64/python2.7/lib-dynload/zlib.so
7ffff10a5000-7ffff12a4000 ---p 00005000 09:01 50333895 /usr/lib64/python2.7/lib-dynload/zlib.so
7ffff12a4000-7ffff12a5000 r--p 00004000 09:01 50333895 /usr/lib64/python2.7/lib-dynload/zlib.so
7ffff12a5000-7ffff12a7000 rw-p 00005000 09:01 50333895 /usr/lib64/python2.7/lib-dynload/zlib.so
7ffff12a7000-7ffff12ad000 r-xp 00000000 00:12 54138491 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/mio_utils.so
7ffff12ad000-7ffff14ac000 ---p 00006000 00:12 54138491 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/mio_utils.so
7ffff14ac000-7ffff14ad000 r--p 00005000 00:12 54138491 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/mio_utils.so
7ffff14ad000-7ffff14ae000 rw-p 00006000 00:12 54138491 /home/avneesh/.local/lib/python2.7/site-packages/scipy/io/matlab/mio_utils.so
7ffff14ae000-7ffff14b5000 r-xp 00000000 00:12 54138562 /home/avneesh/.local/lib/python2.7/site-packages/scipy/sparse/sparsetools/_csgraph.so
7ffff14b5000-7ffff16b4000 ---p 00007000 00:12 54138562 /home/avneesh/.local/lib/python2.7/site-packages/scipy/sparse/sparsetools/_csgraph.so
7ffff16b4000-7ffff16b5000 r--p 00006000 00:12 54138562 /home/avneesh/.local/lib/python2.7/site-packages/scipy/sparse/sparsetools/_csgraph.so
7ffff16b5000-7ffff16b6000 rw-p 00007000 00:12 54138562 /home/avneesh/.local/lib/python2.7/site-packages/scipy/sparse/sparsetools/_csgraph.so
7ffff16b6000-7ffff17c2000 r-xp 00000000 00:12 54138558 /home/avneesh/.local/lib/python2.7/site-packages/scipy/sparse/sparsetools/_bsr.so
7ffff17c2000-7ffff19c2000 ---p 0010c000 00:12 54138558 /home/avneesh/.local/lib/python2.7/site-packages/scipy/sparse/sparsetools/_bsr.so
7ffff19c2000-7ffff19c3000 r--p 0010c000 00:12 54138558 /home/avneesh/.local/lib/python2.7/site-packages/scipy/sparse/sparsetools/_bsr.so
7ffff19c3000-7ffff19c6000 rw-p 0010d000 00:12 54138558 /home/avneesh/.local/lib/python2.7/site-packages/scipy/sparse/sparsetools/_bsr.so
7ffff19c6000-7ffff19d5000 r-xp 00000000 00:12 54138561 /home/avneesh/.local/lib/python2.7/site-packages/scipy/sparse/sparsetools/_dia.so
7ffff19d5000-7ffff1bd4000 ---p 0000f000 00:12 54138561 /home/avneesh/.local/lib/python2.7/site-packages/scipy/sparse/sparsetools/_dia.so
7ffff1bd4000-7ffff1bd5000 r--p 0000e000 00:12 54138561 /home/avneesh/.local/lib/python2.7/site-packages/scipy/sparse/sparsetools/_dia.so
Program received signal SIGABRT, Aborted.
0x00007ffff6e81ab5 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff6e81ab5 in raise () from /lib64/libc.so.6
#1 0x00007ffff6e82fb6 in abort () from /lib64/libc.so.6
#2 0x00007ffff6ebcdd3 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff6ec23b6 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ffff69fc496 in ?? () from /usr/lib64/python2.7/site-packages/numpy/core/multiarray.so
#5 0x00007ffff7af48c7 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#6 0x00007ffff7af6c49 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#7 0x00007ffff7b25592 in PyEval_EvalCode () from /usr/lib64/libpython2.7.so.1.0
#8 0x00007ffff7b33c61 in ?? () from /usr/lib64/libpython2.7.so.1.0
#9 0x00007ffff7b34074 in PyRun_FileExFlags () from /usr/lib64/libpython2.7.so.1.0
#10 0x00007ffff7b347c9 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.7.so.1.0
#11 0x00007ffff7b3e1bc in Py_Main () from /usr/lib64/libpython2.7.so.1.0
#12 0x00007ffff6e6dbfd in __libc_start_main () from /lib64/libc.so.6
#13 0x00000000004006e9 in _start ()
当我将堆栈大小限制设置为无限制”时,我得到以下信息:
*** glibc detected *** /usr/bin/python: double free or corruption (out): 0x00002abb2732c010 ***
^X^C
Program received signal SIGINT, Interrupt.
0x00002aaaab9d08fe in __lll_lock_wait_private () from /lib64/libc.so.6
(gdb) bt
#0 0x00002aaaab9d08fe in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00002aaaab969f2e in _L_lock_9927 () from /lib64/libc.so.6
#2 0x00002aaaab9682d1 in free () from /lib64/libc.so.6
#3 0x00002aaaaaabbfe2 in _dl_scope_free () from /lib64/ld-linux-x86-64.so.2
#4 0x00002aaaaaab70a4 in _dl_map_object_deps () from /lib64/ld-linux-x86-64.so.2
#5 0x00002aaaaaabcaa0 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#6 0x00002aaaaaab85f6 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#7 0x00002aaaaaabc5da in _dl_open () from /lib64/ld-linux-x86-64.so.2
#8 0x00002aaaab9fb530 in do_dlopen () from /lib64/libc.so.6
#9 0x00002aaaaaab85f6 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#10 0x00002aaaab9fb5cf in dlerror_run () from /lib64/libc.so.6
#11 0x00002aaaab9fb637 in __libc_dlopen_mode () from /lib64/libc.so.6
#12 0x00002aaaab9d60c5 in init () from /lib64/libc.so.6
#13 0x00002aaaab080933 in pthread_once () from /lib64/libpthread.so.0
#14 0x00002aaaab9d61bc in backtrace () from /lib64/libc.so.6
#15 0x00002aaaab95dde7 in __libc_message () from /lib64/libc.so.6
#16 0x00002aaaab9633b6 in malloc_printerr () from /lib64/libc.so.6
#17 0x00002aaaab9682dc in free () from /lib64/libc.so.6
#18 0x00002aaaabef1496 in ?? () from /usr/lib64/python2.7/site-packages/numpy/core/multiarray.so
#19 0x00002aaaaad888c7 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#20 0x00002aaaaad8ac49 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#21 0x00002aaaaadb9592 in PyEval_EvalCode () from /usr/lib64/libpython2.7.so.1.0
#22 0x00002aaaaadc7c61 in ?? () from /usr/lib64/libpython2.7.so.1.0
#23 0x00002aaaaadc8074 in PyRun_FileExFlags () from /usr/lib64/libpython2.7.so.1.0
#24 0x00002aaaaadc87c9 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.7.so.1.0
#25 0x00002aaaaadd21bc in Py_Main () from /usr/lib64/libpython2.7.so.1.0
#26 0x00002aaaab90ebfd in __libc_start_main () from /lib64/libc.so.6
#27 0x00000000004006e9 in _start ()
这让我相信基本问题在于 numpy 多阵列核心模块(第一个输出中的第 4 行和第二个输出中的第 18 行)。为了以防万一,我将在 numpy 和 scipy 中将其作为错误报告提出。
有没有人见过这个?
更新:2013 年 2 月 17 日(太平洋标准时间下午 4:45)
我找到了一台可以运行代码的机器,它具有更新版本的 SciPy (0.11) 和 NumPy (1.7.0)。直接运行代码(没有 GDB)导致 seg 错误,没有任何输出到 stdout 或 stderr。再次通过 GDB 运行,我得到以下信息:
Program received signal SIGSEGV, Segmentation fault.
0x00002aaaabead970 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00002aaaabead970 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00002aaaac5fcd04 in PyDataMem_FREE (ptr=<optimized out>, $K8=<optimized out>) at numpy/core/src/multiarray/multiarraymodule.c:3510
#2 array_dealloc (self=0xc00ab7edbfc228fe) at numpy/core/src/multiarray/arrayobject.c:416
#3 0x0000000000498eac in PyEval_EvalFrameEx ()
#4 0x000000000049f1c0 in PyEval_EvalCodeEx ()
#5 0x00000000004a9081 in PyRun_FileExFlags ()
#6 0x00000000004a9311 in PyRun_SimpleFileExFlags ()
#7 0x00000000004aa8bd in Py_Main ()
#8 0x00002aaaabe4f76d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#9 0x000000000041b9b1 in _start ()
我知道这不如使用调试符号编译的 NumPy 有用,我会尝试这样做并稍后发布输出。