I am computing large number of RGB histograms. I need to use explicit loops to do it. Therefore computation of each histogram takes noticeable time. For this reason running the computations in parallel makes sense. In Octave there is an (experimental) function parcellfun written by Jaroslav Hajek that can be used to do it.
My original loop
histograms = zeros(size(files,2), bins^3);
% calculate histogram for each image
for c = 1 : size(files,2)
I = imread(fullfile(dir, files{c}));
h = myhistRGB(I, bins);
histograms(c, :) = h(:); % change to 1D vector
end
To use parcellfun, I need to refactor the body of my loop into a separate function.
function histogram = loadhistogramp(file)
I = imread(fullfile('.', file));
h = myhistRGB(I, 8);
histogram = h(:); % change to 1D vector
end
then I can call it like this
histograms = parcellfun(8, @loadhistogramp, files);
I did a small benchmark on my computer. It is 4 physical cores with Intel HyperThreading enabled.
My original code
tic(); histograms2 = loadhistograms('images.txt', 8); toc();
warning: your version of GraphicsMagick limits images to 8 bits per pixel
Elapsed time is 107.515 seconds.
With parcellfun
octave:1> pkg load general; tic(); histograms = loadhistogramsp('images.txt', 8); toc();
parcellfun: 0/178 jobs donewarning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
parcellfun: 178/178 jobs done
Elapsed time is 29.02 seconds.
(The results from the parallel and serial version were the same (only transposed).
octave:6> sum(sum((histograms'.-histograms2).^2))
ans = 0
When I repeated this several times, the running times were pretty much the same all the time. The parallel version was running around 30 second (+- approx 2s) with both 4, 8 and also 16 subprocesses)