我正在阅读有关 Python 中多处理模块的各种教程,并且无法理解为什么/何时调用process.join()
. 例如,我偶然发现了这个例子:
nums = range(100000)
nprocs = 4
def worker(nums, out_q):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outdict = {}
for n in nums:
outdict[n] = factorize_naive(n)
out_q.put(outdict)
# Each process will get 'chunksize' nums and a queue to put his out
# dict into
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
# Wait for all worker processes to finish
for p in procs:
p.join()
print resultdict
据我了解,process.join()
将阻塞调用进程,直到调用其 join 方法的进程完成执行。我也相信上述代码示例中启动的子进程在完成目标函数后,即在他们将结果推送到out_q
. 最后,我相信这会out_q.get()
阻止调用过程,直到有结果要提取。因此,如果您考虑以下代码:
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
# Wait for all worker processes to finish
for p in procs:
p.join()
主进程被out_q.get()
调用阻塞,直到每个工作进程完成将其结果推送到队列。因此,当主进程退出 for 循环时,每个子进程都应该已完成执行,对吗?
如果是这种情况,此时是否有任何理由调用这些p.join()
方法?不是所有的工作进程都已经完成,那么这如何导致主进程“等待所有工作进程完成”?我问主要是因为我在多个不同的示例中看到了这一点,并且我很好奇我是否无法理解某些内容。