我收到了一台配备 4xGPU 的 Tesla K80 的计算机,我正在尝试使用 Matlab PCT 的 parfor 循环来加快 FFT 的计算速度,但速度却更慢。
这是我正在尝试的:
% Pupil is based on a 512x512 array
parfor zz = 1:4
gd = gpuDevice;
d{zz} = gd.Index;
probe{zz} = gpuArray(pupil);
Essai{zz} = gpuArray(pupil);
end
tic;
parfor ii = 1:4
gd2 = gpuDevice;
d2{ii} = gd2.Index;
for i = 1:100
[Essai{ii}] = fftn(probe{ii});
end
end
toc
%%
Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.
Elapsed time is 1.805763 seconds.
Elapsed time is 1.412928 seconds.
Elapsed time is 1.409559 seconds.
Starting parallel pool (parpool) using the 'local' profile ... connected to 8 workers.
Elapsed time is 0.606602 seconds.
Elapsed time is 0.297850 seconds.
Elapsed time is 0.294365 seconds.
%%
tic; for i = 1:400; Essai{1} = fftn( probe{1} ); end; toc
Elapsed time is 0.193579 seconds !!!
为什么打开 8 个工作人员的速度更快,因为原则上我只将变量存储到 4gpu 中(共 8 个)?
另外,如何将 Tesla K80 用作单个 GPU?
谢谢,尼古拉斯