performance - 带有 if 条件的 for 循环向量化变慢

Question

我有一个 for 循环，我正在对其进行矢量化处理。问题是，一旦矢量化，代码会慢 3 倍。原始代码是热力学算法的一部分，是：

Matrix=rand(10,10,20);
someMatrix=rand(50);
m=5; n=6; CONSTj=9; CONSTk=10; maxGrid=5;
Total=0;

for var=0:maxGrid
    Factor=1;
    p=CONSTj-var;
    q=CONSTk-var;

    if p>=1 && q>=1
        Factor=Matrix(n,m,p)*Matrix(m,n,q);
    elseif p>=1 && q<1
        Factor=Matrix(n,m,p);
    elseif p<1 && q>=1
        Factor=Matrix(m,n,q);
    end

    Total=Total+Factor*(someMatrix(m)^var);
end

我将其矢量化为：

Matrix=rand(10,10,20);
someMatrix=rand(50);
m=5; n=6; CONSTj=9; CONSTk=10; maxGrid=5;

var=(0:maxGrid)';
Factor=ones(maxGrid+1,1);
tempoJ=zeros(maxGrid+1,1);
tempoK=zeros(maxGrid+1,1);
p=CONSTj-var;
q=CONSTk-var;

index1 = find(p>=1 & q>=1);
index2 = find(p>=1 & q<1 );
index3 = find(p<1  & q>=1);

tempoJ(index1)=squeeze(Matrix(n,m,p(index1)));
tempoJ(index2)=squeeze(Matrix(n,m,p(index2)));
tempoK(index1)=squeeze(Matrix(m,n,q(index1)));
tempoK(index3)=squeeze(Matrix(m,n,q(index3)));

Factor(index1)=tempoJ(index1).*tempoK(index1);
Factor(index2)=tempoJ(index2);
Factor(index3)=tempoK(index3);

Total=Factor.*(someMatrix(m).^var);
Total=sum(Total);

探查器说找到，squeeze和sum是最耗时的函数。我相信可以做一些事情来从 if 语句中获取信息，但是如果不更改索引，我找不到更简单的方法。

score 0 · Accepted Answer

好的，这不是答案，但评论中没有足够的空间让我全部输入。

这是我使用的测试代码。基于OP提供的内容。请注意，我将提高到 15，因此和maxGrid中会有一些负值。我还增加了其他矩阵的大小。我也删除了和函数调用。pqfindsqueeze

结果输出是这样的：

Elapsed time is 0.078117 seconds.
Elapsed time is 0.096412 seconds.

最终，这表明代码的第一个版本运行得更快。

tic
Matrix=rand(100,100,200); 
someMatrix=rand(500); 
m=5; n=6; CONSTj=9; CONSTk=10; maxGrid=15; 
Total=0; 

for var=0:maxGrid 
    Factor=1; 
    p=CONSTj-var; 
    q=CONSTk-var; 

    if p>=1 && q>=1 
        Factor=Matrix(n,m,p)*Matrix(m,n,q); 
    elseif p>=1 && q<1 
        Factor=Matrix(n,m,p); 
    elseif p<1 && q>=1 
        Factor=Matrix(m,n,q); 
    end 

    Total=Total+Factor*(someMatrix(m)^var); 
end 
toc

tic
Matrix=rand(100,100,200); 
someMatrix=rand(500); 
m=5; n=6; CONSTj=9; CONSTk=10; maxGrid=15; 

var=(0:maxGrid)'; 
Factor=ones(maxGrid+1,1); 
tempoJ=zeros(maxGrid+1,1); 
tempoK=zeros(maxGrid+1,1); 
p=CONSTj-var; 
q=CONSTk-var; 

% index1 = find(p>=1 & q>=1); 
index1 = ((p>=1) & (q>=1));
% index2 = find(p>=1 & q<1 ); 
index2 = ((p>=1)&(q<1));
% index3 = find(p<1  & q>=1); 
index3 = ((p<1)&(q>=1));

tempoJ(index1)=(Matrix(n,m,p(index1))); 
tempoJ(index2)=(Matrix(n,m,p(index2))); 
tempoK(index1)=(Matrix(m,n,q(index1))); 
tempoK(index3)=(Matrix(m,n,q(index3))); 

Factor(index1)=tempoJ(index1).*tempoK(index1); 
Factor(index2)=tempoJ(index2); 
Factor(index3)=tempoK(index3); 

Total=Factor.*(someMatrix(m).^var); 
Total=sum(Total); 
toc

score 0 · Accepted Answer

我实际上不会对这段代码进行矢量化。最新版本的 Matlab 具有出色的 JIT 编译器，并且矢量化代码没有与旧版本相同的性能改进。

原始代码更容易理解和维护。如果您绝对需要性能，我会考虑为此编写一个 MEX 函数（用 C 语言）。

但是，如果您打算对代码进行矢量化，我会消除不必要的find和squeeze调用（正如@mutzmatron 所建议的那样）。代替：

index1 = find(p>=1 & q>=1);
index2 = find(p>=1 & q<1 );
index3 = find(p<1  & q>=1);

tempoJ(index1)=squeeze(Matrix(n,m,p(index1)));
tempoJ(index2)=squeeze(Matrix(n,m,p(index2)));
tempoK(index1)=squeeze(Matrix(m,n,q(index1)));
tempoK(index3)=squeeze(Matrix(m,n,q(index3)));

和：

p_gt1 = p >= 1;
q_gt1 = q >= 1;

index1 =  p_gt1 &  q_gt1;
index2 =  p_gt1 & ~q_gt1;
index3 = ~p_gt1 &  q_gt1;

tempoJ(index1)=Matrix(n,m,p(index1));
tempoJ(index2)=Matrix(n,m,p(index2));
tempoK(index1)=Matrix(m,n,q(index1));
tempoK(index3)=Matrix(m,n,q(index3));

没必要squeeze。行的 RHS 与squeezeLHS 具有相同数量的元素，因此 Matlab 很乐意将它们一对一映射并填充。

即使有了这种改进，矢量化代码仍然比原来的 for 循环慢一些。我每次都使用TIMEIT：

% 'foo'     your original for loop
% 'foo_v0'  the original vector code, with find and squeeze
% 'foo_v'   the modified vector code, without find/squeeze

>> timeit(@()foo(Matrix,someMatrix))
ans =
   1.1482e-05

>> timeit(@()foo_v(Matrix,someMatrix))
ans =
   2.8039e-05

>> timeit(@()foo_v0(Matrix,someMatrix))
ans =
   1.3256e-04

诚然，这是您为解释问题而简化的代码；您可能会在实际问题上得到不同的结果。但是，请注意，Matlab 在编译代码方面变得更加智能，因此矢量化不再像过去那样产生速度提升。

performance - 带有 if 条件的 for 循环向量化变慢

2 回答 2

Related

Reference