p
您可能已经知道,您应该在循环之前预先分配:
p = zeros(Z, nR);
这可以防止数组p
在每次迭代时增长,从而极大地加快循环速度。
您可以通过以下方式对整个事物进行矢量化bsxfun
:
% C ≣ M×N×R array of all products Rmn·J0m
C = bsxfun(@times, Rmn, J0m);
% D ≣ M×N×R×Z array of all products C·sinanz
D = bsxfun(@times, C, permute(sinanz, [3 1 4 2]));
% Sum in M and N directions, and re-order
p = permute(sum(sum(D,1),2), [4 3 1 2]);
但我怀疑它会更快;MATLAB(阅读:BLAS)处理二维矩阵非常快,但处理更多维数组通常不是非常好。
我建议你阅读一下bsxfun
;这也是按照您描述的方式将J0m
数组减少为M
×的方法。R
当然,您可以permute
通过正确定义变量来摆脱 s,因此让我们在循环代码和矢量化代码的“理想”版本中进行一个小测试:
%% Initialize some variables
% Number of tests
TESTS = 1e4;
% Your dimensions
M = 5; nR = 4;
N = 2; Z = 3;
% Some dummy data
Rmn = rand(M,N);
sinanz = rand(N,Z);
J0m = rand(M,nR); % NOTE: J0m doesn't have to be 3D anymore by using bsxfun
%% Test 1: your own version, optimized
% Start test
start = tic;
% Repeat the test a couple of times to get a good average
for ii = 1:TESTS
p1 = zeros(Z,nR);
for iR = 1:nR
p1(:, iR) = sum( bsxfun(@times,Rmn,J0m(:, iR)), 1 )*sinanz;
end
end
stop = toc(start);
% Average execution time
fprintf(1, 'Average time of looped version: %f seconds.\n', stop/TESTS);
%% Vectorized version, using 4D arrays:
% Don't make the permutes part of the test
J0m = permute(J0m, [1 3 2]);
sinanz = permute(sinanz, [3 1 4 2]);
% Start test
start = tic;
% Repeat the test a couple of times to get a good average
for ii = 1:TESTS
C = bsxfun(@times, Rmn, J0m);
D = bsxfun(@times, C, sinanz);
p2 = sum(sum(D,1),2);
end
stop = toc(start);
% Average execution time
fprintf(1, 'Average time of vectorized version: %f seconds.\n', stop/TESTS);
%% Check for equality
maxDifference = max(max(p1 - squeeze(p2)'))
结果:
Average time of looped version : 0.000054 seconds.
Average time of vectorized version: 0.000023 seconds.
maxDifference =
4.440892098500626e-016
这看起来不错!但是,使用
M = 50; nR = 40;
N = 20; Z = 30;
你得到
Average time of looped version : 0.000708 seconds.
Average time of vectorized version: 0.009835 seconds.
maxDifference =
8.526512829121202e-014
所以矢量化版本比循环变体慢了一个数量级。
当然,您的里程可能会有所不同,但带回家的信息是,您应该预期这种差异会随着尺寸的增加而变得越来越差。
所以,总而言之:
- 如果您的尺寸很大,请坚持使用循环。如果它们很小,也可以使用循环 - 它比矢量化版本在眼睛上容易得多:)
- 确保预先分配
p
变量
- 用于
bsxfun
减少内存占用(但不会大大提高速度)