python - 难以理解 python 的 gc.garbage（用于跟踪内存泄漏）

Question

从稳步增加的内存使用量来看，我的一个 python 应用程序似乎泄漏了内存。我的假设是某个地方的循环引用，尽管已尽最大努力避免这种情况。为了隔离问题，我正在研究手动检查无法访问的项目的方法，这是一个纯粹用于调试的工具。

gc 模块似乎能够进行必要的跟踪，我尝试了以下代码，旨在编译自上次调用以来形成的不可访问项的列表。第一次调用仅设置一个基本检查点，不会识别无法访问的项目。

def unreachable():
  # first time setup
  import gc
  gc.set_threshold( 0 ) # only manual sweeps
  gc.set_debug( gc.DEBUG_SAVEALL ) # keep unreachable items as garbage
  gc.enable() # start gc if not yet running (is this necessary?)
  # operation
  if gc.collect() == 0:
    return 'no unreachable items'
  s = 'unreachable items:\n ' \
    + '\n '.join( '[%d] %s' % item for item in enumerate( gc.garbage ) )
  _deep_purge_list( gc.garbage ) # remove unreachable items
  return s # return unreachable items as text

在这里，_deep_purge_list 旨在打破循环并手动删除对象。下面的实现处理了一些常见的情况，但并不接近水密。我的第一个问题与此有关，请往下看。

def _deep_purge_list( garbage ):
  for item in garbage:
    if isinstance( item, dict ):
      item.clear()
    if isinstance( item, list ):
      del item[:]
    try:
      item.__dict__.clear()
    except:
      pass
  del garbage[:]

根据非常有限的测试，该设置似乎可以正常运行。以下循环引用正确报告一次：

class A( object ):
  def __init__( self ):
    self.ref = self

print unreachable()
# no unreachable items

A()

print unreachable()
# unreachable items:
#  [0] <__main__.A object at 0xb74579ac>
#  [1] {'ref': <__main__.A object at 0xb74579ac>}

print unreachable()
# no unreachable items

然而，发生了以下奇怪的事情：

print unreachable()
# no unreachable items

import numpy

print unreachable()
# unreachable items:
#  [0] (<type '_ctypes.Array'>,)
#  [1] {'__module__': 'numpy.ctypeslib', '__dict__': <attribute '__dict__' of 'c_long_Array_1' objects>, '__weakref__': <attribute '__weakref__' of 'c_long_Array_1' objects>, '_length_': 1, '_type_': <class 'ctypes.c_long'>, '__doc__': None}
#  [2] <class 'numpy.ctypeslib.c_long_Array_1'>
#  [3] <attribute '__dict__' of 'c_long_Array_1' objects>
#  [4] <attribute '__weakref__' of 'c_long_Array_1' objects>
#  [5] (<class 'numpy.ctypeslib.c_long_Array_1'>, <type '_ctypes.Array'>, <type '_ctypes._CData'>, <type 'object'>)

print unreachable()
# unreachable items:
#  [0] (<type '_ctypes.Array'>,)
#  [1] {}
#  [2] <class 'c_long_Array_1'>
#  [3] (<class 'c_long_Array_1'>, <type '_ctypes.Array'>, <type '_ctypes._CData'>, <type 'object'>)

重复调用不断返回最后一个结果。导入后第一次调用 unreachable 时不会出现该问题。但是，在这一点上，我没有理由相信这个问题是特定于 numpy 的；我的猜测是它暴露了我的方法中的一个缺陷。

我的问题：

有没有更好的方法来删除 gc.garbage 中的项目？理想情况下，有没有办法让 gc 删除它们，就像（应该？）在没有 DEBUG_SAVEALL 的情况下所做的那样？
任何人都可以解释 numpy 导入的问题，和/或提出解决方法吗？

事后思考：

看起来下面的代码执行接近预期：

def unreachable():
  import gc
  gc.set_threshold( 0 )
  gc.set_debug( gc.DEBUG_LEAK )
  gc.enable()
  print 'collecting {{{'
  gc.collect()
  print '}}} done'

但是，对于调试，我更喜欢 gc 提供的类型/id 丰富的字符串表示。此外，我想了解我以前方法的缺陷，并了解有关 gc 模块的一些信息。

感谢您的帮助，

格特詹

06/05 更新：

我遇到了第一个实现没有报告任何无法访问的项目的情况，除非在它之前调用 locals() （丢弃返回值）。不了解这可能如何影响 gc 的对象跟踪，这让我更加困惑。我不确定构建一个演示此问题的小示例有多容易，但如果需要，我可以试一试。

score 0 · Accepted Answer

上次我有这样的需求时，我最终使用了该objgraph模块，效果很好。gc它提供的信息比您直接从模块中轻松获得的信息要准确得多。不幸的是，我手头没有任何代码来说明它的用法。

它崩溃的一个地方是在任何调用的 C 代码库分配的内存中。例如，如果一个项目使用 PIL，由于没有正确释放由 C 数据支持的 python 对象，很容易泄漏内存。如何正确关闭此类对象取决于 C 支持的模块。

python - 难以理解 python 的 gc.garbage（用于跟踪内存泄漏）

1 回答 1

Related

Reference