python - 有没有办法限制 Ray 对象存储的最大内存使用量

Question

我正在尝试利用 Ray 的并行化模型来逐条处理文件记录。代码运行良好，但对象存储增长迅速并最终崩溃。我避免使用 ray.get(function.remote()) 因为它会降低性能，因为该任务由几百万个子任务和等待任务完成的开销组成。有没有办法为对象存储设置全局限制？

#code which constantly backpressusre the obejct storage, freeing space, but causes performance to be worse than serial execution
for record in infile:
    ray.get(createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename))

#code that maximizes throughput but makes the object storage grow constantly
for record in infile:
    createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename)

#the called function returns either 0 or 1.

score 5 · Accepted Answer

您可以ray.init(object_store_memory=10**9)限制对象存储使用 1GB。

在https://ray.readthedocs.io/en/latest/memory-management.html的内存管理文档中有更多信息。

python - 有没有办法限制 Ray 对象存储的最大内存使用量

1 回答 1

Related

Reference