1

我尝试通过 apply_async 将共享计数器传递给多处理中的任务,但它失败并出现这样的错误“RuntimeError:同步对象只能通过继承在进程之间共享”。这是怎么回事

def processLine(lines, counter, mutex):
    pass

counter = multiprocessing.Value('i', 0)
mutex = multiprocessing.Lock()
pool = Pool(processes = 8)
lines = []

for line in inputStream:
    lines.append(line)
    if len(lines) >= 5000:
         #don't queue more than 1'000'000 lines
         while counter.value > 1000000:
                 time.sleep(0.05)
         mutex.acquire()
         counter.value += len(lines)
         mutex.release()
         pool.apply_async(processLine, args=(lines, counter, ), callback = collectResults)
         lines = []
4

2 回答 2

2

让池处理调度:

for result in pool.imap(process_single_line, input_stream):
    pass

如果顺序无关紧要:

for result in pool.imap_unordered(process_single_line, input_stream):
    pass

pool.*map*()函数有chunksize参数,您可以更改以查看它是否会影响您的情况下的性能。

如果您的代码期望在一次调用中传递多行:

from itertools import izip_longest

chunks = izip_longest(*[iter(inputStream)]*5000, fillvalue='') # grouper recipe
for result in pool.imap(process_lines, chunks):
    pass

限制排队项目数量的一些替代方法是:

  • multiprocessing.Queue设置最大大小(在这种情况下您不需要池)。queue.put()当达到最大大小时将阻塞,直到其他进程调用queue.get()
  • 使用 Condition 或 BoundedSemaphor 等多处理原语手动实现生产者/消费者模式。

注意:每个值都有关联的锁,你不需要单独的锁。

于 2012-12-27T16:01:31.353 回答
0

I solved it in such not elegant way

def processLine(lines):
    pass

def collectResults(result):
    global counter
    counter -= len(result)

counter = 0
pool = Pool(processes = 8)
lines = []

for line in inputStream:
    lines.append(line)
    if len(lines) >= 5000:
         #don't queue more than 1'000'000 lines
         while counter.value > 1000000:
             time.sleep(0.05)
         counter.value += len(lines)
         pool.apply_async(processLine, args=(lines), callback = collectResults)
         lines = []
于 2012-12-27T13:39:38.327 回答