5

我创建了两个函数:生成数据和处理数据。数据处理很耗时,所以我想在并行线程中处理它们。但我对他们有一些问题。首先,这是我的程序:

result = zeros(1, 10);

matlabpool open local 2
spmd
    for a = 1:5
        data = generate_data();
        display(sprintf('Received data on CPU%d: %d', labindex, data));
        result(end + 1) = process_data(data);
    end
    display(sprintf('All done on CPU%d', labindex));
end
matlabpool close

并记录它返回的内容:

Starting matlabpool using the 'local' profile ... connected to 2 workers.
Lab 1: 
  Received data on CPU1: 100
Lab 2: 
  Received data on CPU2: 100
Lab 1: 
  Received data on CPU1: 101
  Received data on CPU1: 102
  Received data on CPU1: 103
  Received data on CPU1: 104
  All done on CPU1
Lab 2: 
  Received data on CPU2: 101
  Received data on CPU2: 102
  Received data on CPU2: 103
  Received data on CPU2: 104
  All done on CPU2
Sending a stop signal to all the workers ... stopped.

有问题,我有:

  1. 它由 generate_data 返回的值对于两个线程是相同的。我应该与众不同。线程应该处理不同的数据,而不是两次处理相同的数据。我无法一次生成整个数据集并使用 getLocalPart。

  2. 可变结果不是 1x10 的双精度矩阵,而是 1x2 的复合矩阵。我读到了(共同)分布式数组,但它对我没有帮助。我应该怎么做才能收到双打的 1x10 矩阵?

  3. 处理完自己的数据后,我应该对 CPU1 处理 CPU2 的数据做什么?一般来说,我不知道如何做到这一点。

  4. 是否可以删除“Lab 1:”和“Lab 2:”?他们在弄乱我的日志:)

考虑到上述情况,日志(对于更大的数据集)应该是这样的:

Starting matlabpool using the 'local' profile ... connected to 2 workers.
Received data on CPU1: 100
Received data on CPU2: 101
Received data on CPU1: 102
Received data on CPU1: 103
Received data on CPU1: 104
Received data on CPU1: 105
Received data on CPU2: 106
Received data on CPU1: 107
Received data on CPU1: 108
Received data on CPU2: 109
All done on CPU1
All done on CPU2
Sending a stop signal to all the workers ... stopped.
4

1 回答 1

10

你为什么不使用更简单的parfor?目前,您正在每个工作人员上运行循环,我假设您希望并行运行循环的迭代。

nIter = 10;
result = zeros(1, nIter);

matlabpool open local 2

    parfor a = 1:nIter
        data = generate_data();
        fprintf('%s: processing set %i/%i\n',datestr(now),a,nIter)
        result(a) = process_data(data);
    end
end
matlabpool close
于 2013-01-22T14:32:07.563 回答