python - Wrap Multiprocess Pool Inside Loop（进程间共享内存）

Question

我正在使用 Python 包“deap”来解决一些遗传算法的多目标优化问题。这些功能可能非常昂贵，并且由于 GA 的进化性质，它很快就会变得复杂。现在这个包确实有一些支持允许进化计算与多进程并行化。

但是，我想更进一步并多次运行优化，在某些优化参数上使用不同的值。例如，我可能想用不同的权重值来解决优化问题。

这对于循环来说似乎是一个很自然的情况，但问题是这些参数必须在程序的全局范围内定义（即，在“main”函数之上），以便所有子进程都知道这些参数。这是一些伪代码：

# define deap parameters - have to be in the global scope
toolbox = base.Toolbox()
history = tools.History()
weights = [1, 1, -1] # This is primarily what I want to vary
creator.create("Fitness",base.Fitness, weights=weights)
creator.create("Individual", np.ndarray, fitness=creator.Fitness)

def main():
    # run GA to solve multiobjective optimization problem
    return my_optimized_values

if __name__=='__main__':
    ## What I'd like to do but can't ##
    ## all_weights =  list(itertools.product([1, -1],repeat=3))
    ## for combo in all_weights:
    ##     weights = combo
    ##
    pool = multiprocessing.Pool(processes=6)
    # This can be down here, and it distributes the GA computations to a pool of workers
    toolbox.register("map",pool.map) 
    my_values = main()

我研究了各种可能性，例如 multiprocessing.Value、多处理的 pathos fork 等，但最终读取 Individual 类的子进程总是存在问题。

我已经在 deap 用户组上提出了这个问题，但它不像 SO 那样大。另外，在我看来，这更像是一个通用的概念性 Python 问题，而不是 deap 的特定问题。我目前对这个问题的解决方案只是多次运行代码并每次更改一些参数定义。至少这样 GA 计算仍然是并行的，但它确实需要比我想要的更多的手动干预。

任何意见或建议将不胜感激！

score 0 · Accepted Answer

使用initializer/initargs关键字参数为Pool每次运行时需要更改的全局变量传递不同的值。一旦启动，该initializer函数将initargs作为您的内部每个工作进程的参数被调用。Pool您可以在那里将全局变量设置为所需的值，并且它们将在池的整个生命周期内正确设置在每个子项中。

您需要Pool为每次运行创建不同的，但这不应该是一个问题：

toolbox = base.Toolbox()
history = tools.History()
weights = None # We'll set this in the children later.



def init(_weights):
    # This will run in each child process.
    global weights
    weights = _weights
    creator.create("Fitness",base.Fitness, weights=weights)
    creator.create("Individual", np.ndarray, fitness=creator.Fitness)


if __name__=='__main__':
    all_weights =  list(itertools.product([1, -1],repeat=3))
    for combo in all_weights:
        weights = combo
        pool = multiprocessing.Pool(processes=6, initializer=init, initargs=(weights,))
        toolbox.register("map",pool.map) 
        my_values = main()
        pool.close()
        pool.join()

score 0 · Accepted Answer

我也对 DEAP 对全局范围的使用感到不舒服，我想我可以为您提供一个替代解决方案。

每次循环迭代都可以导入每个模块的不同版本，从而避免对全局范围的任何依赖。

this_random = importlib.import_module("random")
this_creator = importlib.import_module("deap.creator")
this_algorithms = importlib.import_module("deap.algorithms")
this_base = importlib.import_module("deap.base")
this_tools = importlib.import_module("deap.tools")

据我所知，这似乎与多处理有关。

例如，这里是 DEAP 的 onemax_mp.py 的一个版本，它避免将任何 DEAP 文件放在全局范围内。我已经包含了一个循环__main__来改变每次迭代的权重。（它第一次最大化数量，第二次最小化它。）多处理一切正常。

#!/usr/bin/env python2.7
#    This file is part of DEAP.
#
#    DEAP is free software: you can redistribute it and/or modify
#    it under the terms of the GNU Lesser General Public License as
#    published by the Free Software Foundation, either version 3 of
#    the License, or (at your option) any later version.
#
#    DEAP is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
#    GNU Lesser General Public License for more details.
#
#    You should have received a copy of the GNU Lesser General Public
#    License along with DEAP. If not, see <http://www.gnu.org/licenses/>.

import array
import multiprocessing
import sys

if sys.version_info < (2, 7):
    print("mpga_onemax example requires Python >= 2.7.")
    exit(1)

import numpy
import importlib


def evalOneMax(individual):
    return sum(individual),


def do_onemax_mp(weights, random_seed=None):
    """ Run the onemax problem with the given weights and random seed. """

    # create local copies of each module
    this_random = importlib.import_module("random")
    this_creator = importlib.import_module("deap.creator")
    this_algorithms = importlib.import_module("deap.algorithms")
    this_base = importlib.import_module("deap.base")
    this_tools = importlib.import_module("deap.tools")

    # hoisted from global scope
    this_creator.create("FitnessMax", this_base.Fitness, weights=weights)
    this_creator.create("Individual", array.array, typecode='b',
                        fitness=this_creator.FitnessMax)
    this_toolbox = this_base.Toolbox()
    this_toolbox.register("attr_bool", this_random.randint, 0, 1)
    this_toolbox.register("individual", this_tools.initRepeat,
                          this_creator.Individual, this_toolbox.attr_bool, 100)
    this_toolbox.register("population", this_tools.initRepeat, list,
                          this_toolbox.individual)
    this_toolbox.register("evaluate", evalOneMax)
    this_toolbox.register("mate", this_tools.cxTwoPoint)
    this_toolbox.register("mutate", this_tools.mutFlipBit, indpb=0.05)
    this_toolbox.register("select", this_tools.selTournament, tournsize=3)

    # hoisted from __main__
    this_random.seed(random_seed)
    pool = multiprocessing.Pool(processes=4)
    this_toolbox.register("map", pool.map)
    pop = this_toolbox.population(n=300)
    hof = this_tools.HallOfFame(1)
    this_stats = this_tools.Statistics(lambda ind: ind.fitness.values)
    this_stats.register("avg", numpy.mean)
    this_stats.register("std", numpy.std)
    this_stats.register("min", numpy.min)
    this_stats.register("max", numpy.max)

    this_algorithms.eaSimple(pop, this_toolbox, cxpb=0.5, mutpb=0.2, ngen=40,
                             stats=this_stats, halloffame=hof)

    pool.close()

if __name__ == "__main__":
    for tgt_weights in ((1.0,), (-1.0,)):
        do_onemax_mp(tgt_weights)

python - Wrap Multiprocess Pool Inside Loop（进程间共享内存）

2 回答 2

Related

Reference