13

我的代码(遗传优化算法的一部分)并行运行几个进程,等待所有进程完成,读取输出,然后使用不同的输入重复。当我重复 60 次测试时,一切正常。由于它有效,我决定使用更实际的重复次数 200。我收到此错误:

File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
 self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
 self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 302, in _handle_workers
 pool._maintain_pool()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 206, in _maintain_pool
 self._repopulate_pool()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 199, in _repopulate_pool
 w.start()
File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
 self._popen = Popen(self)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 120, in __init__
 self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

这是我使用池的代码片段:

def RunMany(inputs):
from multiprocessing import cpu_count, Pool
proc=inputs[0]
pool=Pool(processes = proc) 
results=[]
for arg1 in inputs[1]:
    for arg2 in inputs[2]:
        for arg3 in inputs[3]:
            results.append(pool.apply_async(RunOne, args=(arg1, arg2, arg3)))
casenum=0
datadict=dict()
for p in results:
    #get results of simulation once it has finished
    datadict[casenum]=p.get() 
    casenum+=1
return datadict

RunOne 函数在我创建的类中创建一个对象,使用计算量大的 python 包来解决一个大约需要 30 秒的化学问题,并返回带有化学求解器输出的对象。

因此,我的代码串行调用 RunMany,然后 RunMany 并行调用 RunOne。在我的测试中,我使用 10 个处理器(计算机有 16 个)和一个包含 20 个对 RunOne 的调用的池调用了 RunOne。换句话说,len(arg1)*len(arg2)*len(arg3)=20。当我的代码调用 RunMany 60 次时一切正常,但是当我调用它 200 次时内存不足。

这是否意味着某些进程本身没有正确清理?我有内存泄漏吗?如何确定我是否有内存泄漏,以及如何找出泄漏的原因?在我的 200 次重复循环中唯一增长的项目是一个从 0 大小增长到 200 长度的数字列表。我有一个来自我构建的自定义类的对象字典,但它的上限是长度50 个条目 - 每次循环执行时,它都会从字典中删除一个项目并用另一个项目替换它。

编辑:这是调用 RunMany 的代码片段

for run in range(nruns):
    #create inputs object for RunMany using genetic methods. 
    #Either use starting "population" or create "child" inputs from successful previous runs
    datadict = RunMany(inputs)

    sumsquare=0
    for i in range(len(datadictsenk)): #input condition
        sumsquare+=Compare(datadict[i],Target[i]) #compare result to target

    with open(os.path.join(mainpath,'Outputs','output.txt'),'a') as f:
        f.write('\t'.join([str(x) for x in [inputs.name, sumsquare]])+'\n')

    Objective.append(sumsquare) #add sum of squares to list, to be plotted outside of loop
    population[inputs]=sumsquare #add/update the model in the "population", using the inputs object as a key, and it's objective function as the value
    if len(population)>initialpopulation:
        population = PopulationReduction(population) #reduce the "population" by "killing" unfit "genes"
    avgtime=(datetime.datetime.now()-starttime2)//(run+1)
    remaining=(nruns-run-1)*avgtime
    print(' Finished '+str(run+1)+' / ' +str(nruns)+'. Elapsed: '+str(datetime.datetime.now().replace(microsecond=0)-starttime)+' Remaining: '+str(remaining)+' Finish at '+str((datetime.datetime.now()+remaining).replace(microsecond=0))+'~~~', end="\r")
4

1 回答 1

17

如对我的问题的评论所示,答案来自 Puciek。

解决方案是在完成后关闭进程池。我以为它会自动关闭,因为results变量是本地的RunMany,完成后会被删除RunMany。但是,python 并不总是按预期工作。

固定代码是:

def RunMany(inputs):
from multiprocessing import cpu_count, Pool
proc=inputs[0]
pool=Pool(processes = proc) 
results=[]
for arg1 in inputs[1]:
    for arg2 in inputs[2]:
        for arg3 in inputs[3]:
            results.append(pool.apply_async(RunOne, args=(arg1, arg2, arg3)))
#new section
pool.close()
pool.join()    
#end new section
casenum=0
datadict=dict()
for p in results:
    #get results of simulation once it has finished
    datadict[casenum]=p.get() 
    casenum+=1
return datadict
于 2014-11-03T22:25:44.673 回答