我有一段代码执行以下操作:
for each file (already read in the RAM):
call a function and obtain a result
add the results up and disply
可以并行分析每个文件。分析每个文件的函数如下:
# Complexity = 1000*19*19 units of work
def fun(args):
(a, b, p) = args
for itr in range(1000):
for i in range(19):
for j in range(19):
# The following random number generated depends on
# latest values in (i-1, j), (i+1, j), (i, j-1) & (i, j+1)
# cells of latest a and b arrays
u = np.random.rand();
if (u < p):
a[i, j] += -1
else:
b[i, j] += 1
return a+b
我正在使用multiprocessing
包来实现并行性:
import numpy as np
import time
from multiprocessing import Pool, cpu_count
if __name__ == '__main__':
t = time.time()
pool = Pool(processes=cpu_count())
args = [None]*100
for i in range(100):
a = np.random.randint(2, size=(19, 19))
b = np.random.randint(2, size=(19, 19))
p = np.random.rand()
args[i] = (a, b, p)
result = pool.map(fun, args)
for i in range(2, 100):
result[0] += result[i]
print result[0]
print time.time() - t
我已经编写了在每次迭代中MATLAB
使用parfor
和调用的等效代码:fun
parfor
tic
args = cell(100, 1);
r = cell(100, 1);
parfor i = 1:100
a = randi(2, 19, 19);
b = randi(2, 19, 19);
p = rand();
args{i}.a = a;
args{i}.b = b;
args{i}.p = p;
r{i} = fun(args{i});
end
for i = 2:100
r{1} = r{1} + r{i};
end
disp(r{1});
toc
的实现fun
如下:
function [ ret ] = fun( args )
a = args.a;
b = args.b;
p = args.p;
for itr = 1:1000
for i = 1:19
for j = 1:19
u = rand();
if (u < p)
a(i, j) = a(i, j) + -1;
else
b(i, j) = b(i, j) + 1;
end
end
end
end
ret = a + b;
end
我发现这MATLAB
非常快,双核处理器大约需要 1.5 秒,而Python
程序大约需要 33-34 秒。为什么会这样?
编辑:很多答案建议我应该矢量化随机数生成。实际上它的工作方式是,生成的随机数取决于最新的 a 和 b 2D 数组。我只是打了一个简单的rand()
电话来保持程序简单易读。在我的程序的实际中,随机数总是通过查看(i,j)单元格的某些水平和垂直相邻单元格来生成。所以不可能向量化它。