ipython - MemoryError 将数据发送到 ipyparallel 引擎

Question

我们喜欢 Ipython.parallel（现在是 ipyparallel）。

不过，有些事情让我很烦恼。当向一群工作人员发送约 1.5GB 的 pandas 数据帧时，如果集群有很多节点，我们会收到 MemoryError。看起来数据帧的副本与引擎的数量（或某个比例数）一样多。有没有办法避免这些副本？

例子：

In[]: direct_view.push({'xy':xy}, block=True)
# or direct_view['xy'] = xy

对于小型集群（例如 30 个节点），内存不断增长，但最终数据通过，一切正常。但是对于更大的集群，例如 80 个节点（所有 r3.4xlarge 只有 1 个引擎，而不是 n_core 引擎），然后htop报告内存增长到最大值（123GB），我们得到：

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-120-f6a9a69761db> in <module>()
----> 1 get_ipython().run_cell_magic(u'time', u'', u"ipc.direct_view.push({'xy':xy}, block=True)")

/opt/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
   2291             magic_arg_s = self.var_expand(line, stack_depth)
   2292             with self.builtin_trap:
-> 2293                 result = fn(magic_arg_s, cell)
   2294             return result
   2295 

(...)

请注意，在查看了https://ipyparallel.readthedocs.org/en/latest/details.html之后，我们尝试仅发送底层 numpy 数组 ( xy.values) 以尝试进行“非复制发送”但也获得MemoryError.

版本：

Jupyter 笔记本 v.4.0.4
Python 2.7.10
ipyparallel.__version__: 4.0.2

ipython - MemoryError 将数据发送到 ipyparallel 引擎

0 回答 0

Related

Reference