I want to load a huge dict/2d array in the father process and share it between child processes. All the child processes only need to read it without any other modification operations. Now I have several questions in terms of multiprocessing.pool and manager.(python3.6+linux) (1)Using a common dict or list will sometimes cause "cannot allocate memory" error, is this because the fork() mechanism in linux system? The huge dict will be copied 30 times in the memory if there are 30 processes, is that true? (2)I try to use manager to deal with this problem because it's more friendly to dict and array type. but it's extrmely slow and sometimes caused deadlock.(enough memory but 0 in cpu usage). How can I fix it?
#simplified version
def process_pool_return_loaddata(method, data, core_num):
pool = Pool(processes=core_num)
pool_result = []
try:
pool_result = pool.map(method, data)
except Exception as e:
logging.error(e)
finally:
pool.close()
pool.join()
return pool_result
def func(data):
list=data[1]
return([data[0],list+1])
if __name__ == '__main__':
list=[[1,2],[3,4],[5,6]]
mgr=Manager()
lst=mgr.list(group_sid)
set_subject = process_pool_return_loaddata(func,lst, 2)
(3) In another scenario, I want to define the manager.dict object as a global varieble and read it in the child process directly. Also, some modification will happen later in the father process. The implementation is similar as before, but again, it's extremely slow although only read happened. How can I fix it? Is there some other solutions? Thank you for any advices!