可能重复:
多处理启动太多 Python VM 实例
我正在尝试使用 python 多进程来并行化 Web 获取,但我发现调用多进程的应用程序被多次实例化,而不仅仅是我想要调用的函数(这对我来说是个问题,因为调用者对一个实例化缓慢的库 - 失去了我从并行性中获得的大部分性能收益)。
我做错了什么或如何避免?
我的应用程序.py:
from url_fetcher import url_fetch, parallel_fetch
import my_slow_stuff
my_slow_stuff.py:
if __name__ == '__main__':
import datetime
urls = ['http://www.microsoft.com'] * 20
results = parallel_fetch(urls, fn=url_fetch)
print([x[:20] for x in results])
class MySlowStuff(object):
import time
print('doing slow stuff')
time.sleep(0)
print('done slow stuff')
url_fetcher.py:
import multiprocessing
import urllib
def url_fetch(url):
#return urllib.urlopen(url).read()
return url
def parallel_fetch(urls, fn):
PROCESSES = 10
CHUNK_SIZE = 1
pool = multiprocessing.Pool(PROCESSES)
results = pool.imap(fn, urls, CHUNK_SIZE)
return results
if __name__ == '__main__':
import datetime
urls = ['http://www.microsoft.com'] * 20
results = parallel_fetch(urls, fn=url_fetch)
print([x[:20] for x in results])
部分输出:
$ python my_app.py
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
...