matlab - 多个 Tesla K80 GPU 和 parfor 循环

Question

我收到了一台配备 4xGPU 的 Tesla K80 的计算机，我正在尝试使用 Matlab PCT 的 parfor 循环来加快 FFT 的计算速度，但速度却更慢。

这是我正在尝试的：

% Pupil is based on a 512x512 array

    parfor zz = 1:4
        gd = gpuDevice;
        d{zz} = gd.Index;
        probe{zz} = gpuArray(pupil); 
        Essai{zz} = gpuArray(pupil); 
    end

    tic;
    parfor ii = 1:4
        gd2 = gpuDevice;
        d2{ii} = gd2.Index;
        for i = 1:100
        [Essai{ii}] = fftn(probe{ii});
        end
    end
    toc
    %%

Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.
Elapsed time is 1.805763 seconds.
Elapsed time is 1.412928 seconds.
Elapsed time is 1.409559 seconds.

Starting parallel pool (parpool) using the 'local' profile ... connected to 8 workers.
Elapsed time is 0.606602 seconds.
Elapsed time is 0.297850 seconds.
Elapsed time is 0.294365 seconds.
%%
tic; for i = 1:400; Essai{1} = fftn( probe{1} ); end; toc
Elapsed time is 0.193579 seconds !!!

为什么打开 8 个工作人员的速度更快，因为原则上我只将变量存储到 4gpu 中（共 8 个）？

另外，如何将 Tesla K80 用作单个 GPU？

谢谢，尼古拉斯

score 1 · Accepted Answer

我怀疑 parfor 是否适用于多 GPU 系统。如果速度至关重要并且您想充分利用 GPU，我建议使用 cuFFT 库编写自己的小 CUDA 脚本：http: //docs.nvidia.com/cuda/cufft/#multiple-GPU-cufft-变换

以下是如何编写包含 CUDA 代码的 mex 文件： http: //www.mathworks.com/help/distcomp/run-mex-functions- contains-cuda-code.html

score 0 · Accepted Answer

非常感谢您的快速回复和链接！确实，我试图避免使用 CUDA，但这似乎是传播 FFT 的最佳选择。虽然我认为 parfor 和 spmd 是多个 GPU 的好工具..

matlab - 多个 Tesla K80 GPU 和 parfor 循环

2 回答 2

Related

Reference