我在python中遇到了这个问题:
- 我有一个需要不时检查的 URL 队列
- 如果队列已满,我需要处理队列中的每个项目
- 队列中的每一项都必须由单个进程处理(多处理)
到目前为止,我设法像这样“手动”实现了这一目标:
while 1:
self.updateQueue()
while not self.mainUrlQueue.empty():
domain = self.mainUrlQueue.get()
# if we didn't launched any process yet, we need to do so
if len(self.jobs) < maxprocess:
self.startJob(domain)
#time.sleep(1)
else:
# If we already have process started we need to clear the old process in our pool and start new ones
jobdone = 0
# We circle through each of the process, until we find one free ; only then leave the loop
while jobdone == 0:
for p in self.jobs :
#print "entering loop"
# if the process finished
if not p.is_alive() and jobdone == 0:
#print str(p.pid) + " job dead, starting new one"
self.jobs.remove(p)
self.startJob(domain)
jobdone = 1
然而,这会导致大量的问题和错误。我想知道我是否更适合使用进程池。这样做的正确方法是什么?
但是,很多时候我的队列是空的,一秒钟可以填满 300 个项目,所以我不太清楚这里该怎么做。