3

我正在尝试利用 Ray 的并行化模型来逐条处理文件记录。代码运行良好,但对象存储增长迅速并最终崩溃。我避免使用 ray.get(function.remote()) 因为它会降低性能,因为该任务由几百万个子任务和等待任务完成的开销组成。有没有办法为对象存储设置全局限制?

#code which constantly backpressusre the obejct storage, freeing space, but causes performance to be worse than serial execution
for record in infile:
    ray.get(createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename))

#code that maximizes throughput but makes the object storage grow constantly
for record in infile:
    createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename)

#the called function returns either 0 or 1.
4

1 回答 1

5

您可以ray.init(object_store_memory=10**9)限制对象存储使用 1GB。

https://ray.readthedocs.io/en/latest/memory-management.html的内存管理文档中有更多信息。

于 2019-09-04T23:30:56.770 回答