python - Python - 多处理和共享内存

Question

我正在使用 Deap 框架实现遗传算法。该算法有效，但我注意到 GA 的多进程版本非常消耗 9 GB 的内存，而单进程的 2 GB 我怀疑是因为它已经为每个进程分配了内存。事实上，一旦执行映射，使用的内存就会增加。由于进程之间共享的数据仅用于读取，因此它们都可以访问相同的内存。

这是我的代码的结构。

def evaluate(individual, dataset=None):

    penalty = dataset.compute(individual)

    return penalty


def initialize():
   dataset = dataset(file1, file2)

   pool = multiprocessing.Pool()
   toolbox.register("map", pool.map)

   toolbox.register("evaluate", evaluate, dataset=dataset)

   return toolbox, dataset


def main():
   toolbox, dataset = initialize()

   dataset.data = some_training_set

   fitnesses = toolbox.map(toolbox.evaluate, population)

   dataset.data = some_validation_set

   fitnesses = toolbox.map(toolbox.evaluate, population)

然后我有一个包含数据集（使用 pandas 读取）和字典的类。

class Dataset:

    def __init__(self, file1, file2):
        self.data = read(file1)
        self.dict = loadpickle(file2)

    def compute(self, individual):
       for row in self.data
           # some stuff reading row and self.dict

共享内存的最简单方法是什么？我尝试对 self.data 和 self.dict 使用全局变量，但没有...

score 2 · Accepted Answer

多处理模块使用多进程模型而不是线程模型，因此每个进程不能共享内存（不使用共享内存 IPC 调用）。如果您需要它来共享内存，则需要重新设计 Deap 框架以使用线程。

python - Python - 多处理和共享内存

1 回答 1

Related

Reference