python - Python 多处理。池和内存

Question

我正在使用Pool.map评分程序：

来自数据源的数百万个数组的“光标”
计算
将结果保存在数据接收器中

结果是独立的。

我只是想知道是否可以避免内存需求。起初，似乎每个数组都进入了 python，然后 2 和 3 继续进行。无论如何，我的速度有所提高。

#data src and sink is in mongodb#
def scoring(some_arguments):
        ### some stuff  and finally persist  ###
    collection.update({uid:_uid},{'$set':res_profile},upsert=True)


cursor = tracking.find(timeout=False)
score_proc_pool = Pool(options.cores)    
#finaly I use a wrapper so I have only the document as input for map
score_proc_pool.map(scoring_wrapper,cursor,chunksize=10000)

我是在做错什么，还是有更好的方式使用 python 来实现这个目的？

score 1 · Accepted Answer

如果它没有属性，则 a的map函数在内部将可迭代对象转换为列表。相关代码在中，因为(and ) 使用它来产生结果 - 这也是一个列表。Pool__len__Pool.map_asyncPool.mapstarmap

如果您不想先将所有数据读入内存，则应使用Pool.imapor Pool.imap_unordered，它将生成一个迭代器，该迭代器将在结果进入时产生结果。

python - Python 多处理。池和内存

1 回答 1

Related

Reference