3

I'm using Multiprocessing with a large (~5G) read-only dict used by processes. I started by passing the whole dict to each process, but ran into memory restraints, so changed to use a Multiprocessing Manager dict (after reading this How to share a dictionary between multiple processes in python without locking )

Since the change, performance has dived. What alternatives are there for a faster shared data store? The dict has a 40 character string key, and 2 small string element tuple data.

4

1 回答 1

0

使用内存映射文件。虽然这听起来很疯狂(性能方面),但如果你使用一些聪明的技巧可能不会:

  1. 对键进行排序,以便您可以在文件中使用二进制搜索来定位记录
  2. 尝试使文件的每一行长度相同(“固定宽度记录”)

如果您不能使用固定宽度的记录,请使用以下伪代码:

Read 1KB in the middle (or enough to be sure the longest line fits *twice*)
Find the first new line character
Find the next new line character
Get a line as a substring between the two positions
Check the key (first 40 bytes)
If the key is too big, repeat with a 1KB block in the first half of the search range, else in the upper half of the search range

如果性能不够好,请考虑用 C 编写扩展。

于 2013-10-02T08:44:43.663 回答