我在我的 python 代码中使用多处理来异步运行一个函数:
import multiprocessing
po = multiprocessing.Pool()
for elements in a_list:
results.append(po.apply_async(my_module.my_function, (some_arguments, elements, a_big_argument)))
po.close()
po.join()
for r in results:
a_new_list.add(r.get())
a_big_argument
是一本字典。我把它作为一个论据。从某种意义上说,它在 10 到 100 Mo 之间很大。它似乎对我的代码的性能有很大的影响。
我可能在这里做一些愚蠢且效率不高的事情,因为我的代码的性能确实因这个新参数而下降。
处理大字典的最佳方法是什么?我不想每次都在我的函数中加载它。它会是创建数据库并连接到它的解决方案吗?
这是您可以运行的代码:
'''
Created on Mar 11, 2013
@author: Antonin
'''
import multiprocessing
import random
# generate an artificially big dictionary
def generateBigDict():
myBigDict = {}
for key in range (0,1000000):
myBigDict[key] = 1
return myBigDict
def myMainFunction():
# load the dictionary
myBigDict = generateBigDict()
# create a list on which we will asynchronously run the subfunction
myList = []
for list_element in range(0,20):
myList.append(random.randrange(0,1000000))
# an empty set to receive results
set_of_results = set()
# there is a for loop here on one of the arguments
for loop_element in range(0,150):
results = []
# asynchronoulsy run the subfunction
po = multiprocessing.Pool()
for list_element in myList:
results.append(po.apply_async(mySubFunction, (loop_element, list_element, myBigDict)))
po.close()
po.join()
for r in results:
set_of_results.add(r.get())
for element in set_of_results:
print element
def mySubFunction(loop_element, list_element, myBigDict):
import math
intermediaryResult = myBigDict[list_element]
finalResult = intermediaryResult + loop_element
return math.log(finalResult)
if __name__ == '__main__':
myMainFunction()