这是 python 2.6.x 的一些性能数据,它质疑线程在 IO 绑定场景中比多处理性能更高的概念。这些结果来自 40 个处理器的 IBM System x3650 M4 BD。
IO-Bound Processing : Process Pool 比 Thread Pool 执行得更好
>>> do_work(50, 300, 'thread','fileio')
do_work function took 455.752 ms
>>> do_work(50, 300, 'process','fileio')
do_work function took 319.279 ms
CPU-Bound 处理:进程池比线程池执行得更好
>>> do_work(50, 2000, 'thread','square')
do_work function took 338.309 ms
>>> do_work(50, 2000, 'process','square')
do_work function took 287.488 ms
这些不是严格的测试,但它们告诉我,与线程相比,多处理并非完全没有性能。
用于上述测试的交互式 python 控制台中的代码
from multiprocessing import Pool
from multiprocessing.pool import ThreadPool
import time
import sys
import os
from glob import glob
text_for_test = str(range(1,100000))
def fileio(i):
try :
os.remove(glob('./test/test-*'))
except :
pass
f=open('./test/test-'+str(i),'a')
f.write(text_for_test)
f.close()
f=open('./test/test-'+str(i),'r')
text = f.read()
f.close()
def square(i):
return i*i
def timing(f):
def wrap(*args):
time1 = time.time()
ret = f(*args)
time2 = time.time()
print '%s function took %0.3f ms' % (f.func_name, (time2-time1)*1000.0)
return ret
return wrap
result = None
@timing
def do_work(process_count, items, process_type, method) :
pool = None
if process_type == 'process' :
pool = Pool(processes=process_count)
else :
pool = ThreadPool(processes=process_count)
if method == 'square' :
multiple_results = [pool.apply_async(square,(a,)) for a in range(1,items)]
result = [res.get() for res in multiple_results]
else :
multiple_results = [pool.apply_async(fileio,(a,)) for a in range(1,items)]
result = [res.get() for res in multiple_results]
do_work(50, 300, 'thread','fileio')
do_work(50, 300, 'process','fileio')
do_work(50, 2000, 'thread','square')
do_work(50, 2000, 'process','square')