13

我想在单台机器(而不是集群)上并行化 Octave 中的 for 循环。前段时间我问了一个关于 Octave 并行版本的问题 octave 中的 并行计算

答案建议我下载一个并行计算包,我这样做了。该软件包似乎主要面向集群计算,但确实提到了单机并行计算,但不清楚如何运行甚至是并行循环。

我还发现了另一个关于 SO 的问题,但是我没有找到在 Octave 中并行化循环的好答案: Running parts of a loop in parallel with Octave?

有谁知道我在哪里可以找到在 Octave 中并行运行 for 循环的示例???

4

3 回答 3

14

I am computing large number of RGB histograms. I need to use explicit loops to do it. Therefore computation of each histogram takes noticeable time. For this reason running the computations in parallel makes sense. In Octave there is an (experimental) function parcellfun written by Jaroslav Hajek that can be used to do it.

My original loop

histograms = zeros(size(files,2), bins^3);
  % calculate histogram for each image
  for c = 1 : size(files,2)
    I = imread(fullfile(dir, files{c}));
    h = myhistRGB(I, bins);
    histograms(c, :) = h(:); % change to 1D vector
  end

To use parcellfun, I need to refactor the body of my loop into a separate function.

function histogram = loadhistogramp(file)
  I = imread(fullfile('.', file));
  h = myhistRGB(I, 8);
  histogram = h(:); % change to 1D vector
end

then I can call it like this

histograms = parcellfun(8, @loadhistogramp, files);

I did a small benchmark on my computer. It is 4 physical cores with Intel HyperThreading enabled.

My original code

tic(); histograms2 = loadhistograms('images.txt', 8); toc();
warning: your version of GraphicsMagick limits images to 8 bits per pixel
Elapsed time is 107.515 seconds.

With parcellfun

octave:1> pkg load general; tic(); histograms = loadhistogramsp('images.txt', 8); toc();
parcellfun: 0/178 jobs donewarning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
parcellfun: 178/178 jobs done
Elapsed time is 29.02 seconds.

(The results from the parallel and serial version were the same (only transposed).

octave:6> sum(sum((histograms'.-histograms2).^2))
ans = 0

When I repeated this several times, the running times were pretty much the same all the time. The parallel version was running around 30 second (+- approx 2s) with both 4, 8 and also 16 subprocesses)

于 2013-11-05T20:55:03.297 回答
12

八度循环很慢,很慢,很慢,而且你最好用数组操作来表达事物。让我们以在 2d 域上评估一个简单的 trig 函数为例,就像在这个3d octave 图形示例中一样(但计算点的数量更真实,而不是绘图):

矢量化.m:

tic()
x = -2:0.01:2;
y = -2:0.01:2;
[xx,yy] = meshgrid(x,y);
z = sin(xx.^2-yy.^2);
toc()

将其转换为 for 循环为我们提供了 forloops.m:

tic()
x = -2:0.01:2;
y = -2:0.01:2;
z = zeros(401,401);
for i=1:401
    for j=1:401
        lx = x(i);
        ly = y(j);
        z(i,j) = sin(lx^2 - ly^2);
    endfor        
endfor
toc()

请注意,矢量化版本已经“赢得”了更简单和更清晰的阅读,但还有另一个重要优势;时间有很大不同:

$ octave --quiet vectorized.m 
Elapsed time is 0.02057 seconds.

$ octave --quiet forloops.m 
Elapsed time is 2.45772 seconds.

因此,如果您正在使用 for 循环,并且您拥有完美的并行性而没有开销,那么您必须将其分解为 119 个处理器,以便与非 for-loop 保持平衡!

不要误会我的意思,并行性很棒,但首先要让事情以串行方式有效地工作。

几乎所有 octave 的内置函数都已经向量化,因为它们在标量或整个数组上运行得一样好;因此通常很容易将事物转换为数组操作,而不是逐个元素地进行操作。对于那些不那么容易的时候,您通常会看到已经存在的实用功能(例如 meshgrid,它从 2 个向量的笛卡尔积生成二维网格)来帮助您。

于 2012-05-10T22:25:51.630 回答
4

现在pararrayfun可以在此处找到使用示例: http ://wiki.octave.org/Parallel_package

于 2014-08-21T17:35:43.020 回答