python - pandas、HDFstore 和通过加载/卸载周期的内存使用情况

Question

我很乐意使用 pandas 来存储和操作实验数据。通常，我选择 HDF 格式（我不掌握）pd.HDFstore来保存东西。

我的数据框越来越大，需要一些内存经济性。

我阅读了相关问题中链接的一些指南，尽管我无法实现可持续的内存消耗，例如在我的以下典型任务中：

. load some `df` in memory (scale size is 10GB)
. do business with some other preloaded `df`
. unload
. repeat

显然我在卸载阶段一直失败。

因此，我希望您考虑以下实验。

（从新启动的内核（在 ipython 笔记本中，如果重要的话））

import pandas as pd

for idx in range(6):
    print idx
    store = pd.HDFStore('detection_DB_N.h5')
    detection_DB = store['detection_DB']
    store.close()

    del detection_DB

统计数据（来自top）：

. memory used by first iteration ~8GB
. memory used at the end of execution ~10GB (6 cycles)

然后，在同一个内核中，我运行

for idx in range(6):
    print idx
    store = pd.HDFStore('detection_DB_N.h5')
    detection_DB = store['detection_DB']
    store.close()

    #del detection_DB  #SAME AS BEFORE, BUT I DON'T del

统计：

. memory used at the end of execution ~15GB

调用 adel detection_DB对内存没有任何影响（CPU 使用率会持续高 5 秒）。

类似地，调用

 import gc 
 gc.collect()

没有任何相关的区别。

我补充说，值得一提的是，重复之前的调用，我到达时占用了大约 20GB（并且没有加载的对象可以玩）。

任何人都可以解释一下吗？

之后我怎样才能达到~0GB（左右）占用del？

python - pandas、HDFstore 和通过加载/卸载周期的内存使用情况

0 回答 0

Related

Reference