1

我为线性系统的分辨率实现了 Jacobi 方法的并行版本。做一些测试我注意到与执行顺序函数的时间相比,并行执行函数的时间非常长。这很奇怪,因为 Jacobi 的方法在并行实现时应该更快。

我认为我在代码中做错了什么:

function [x,niter,resrel] = Parallel_Jacobi(A,b,TOL,MAXITER)
[n, m] = size(A); 
D = 1./spdiags(A,0);
B = speye(n)-A./spdiags(A,0);
C= D.*b;
x0=sparse(zeros(length(A),1));
spmd

    cod_vett=codistributor1d(1,codistributor1d.unsetPartition,[n,1]);
    cod_mat=codistributor1d(1,codistributor1d.unsetPartition,[n,m]);

    B= codistributed(B,cod_mat);
    C= codistributed(C,cod_vett);
    x= codistributed(B*x0 + C,cod_vett);

    Niter = 1; 
    TOLX = TOL;  
    while(norm(x-x0,Inf) > norm(x0,Inf)*TOLX && Niter < MAXITER)
        if(TOL*norm(x,Inf) > realmin)
            TOLX = norm(x,Inf)*TOL;
        else
            TOLX = realmin;
        end    
        x0 = x;
        x = B*x0 + C;
        Niter=Niter+1;
    end
end
Niter=Niter{1}; 
x=gather(x);
end

下面是测试

%sequential Jacobi
format long;
A = gallery('poisson',20);
tic;
x= jacobi(A,ones(400,1),1e-6,2000000);
toc;
Elapsed time is 0.009054 seconds.
%parallel Jacobi
format long;
A = gallery('poisson',20);
tic;
x= Parallel_Jacobi(A,ones(400,1),1e-6,2000000);
toc;
Elapsed time is 11.484130 seconds.

我用 1、2、3 和 4 个工作人员(我有一个四核处理器)对函数进行计时,parpool结果如下:

%Test
format long;
A = gallery('poisson',20);
delete(gcp('nocreate'));
tic
%parpool(1/2/3/4) means that i executed 4 tests that differ only for the 
%argument in the function: first parpool(1), second parpool(2) and so on.
parpool(1/2/3/4);
toc
tic;
x= Parallel_Jacobi(A,ones(400,1),1e-6,2000000);
toc;

4 workers: parpool=13.322899 seconds, function=23.772271 

3 workers: parpool=10.911769 seconds, function=16.402633 

2 workers: parpool=9.371729 seconds, function=12.945154 

1 worker: parpool=8.460357 seconds, function=7.982958 .

工人越少,时间就越好。这就像@Adriaan 所说的那样,可能是由于开销。

这是否意味着,在这种情况下,顺序函数总是比并行函数快?还是有更好的方法来实现并行?

这个问题中,据说迭代次数多时并行性能更好。在我的例子中,通过这个测试,只有 32 次迭代。

Jacobi 方法的顺序实现是这样的:

function [x,niter,resrel] = jacobi(A,b,TOL,MAXITER)
n = size(A,1); 
D = 1./spdiags(A,0);
B = speye(n)-A./spdiags(A,0);
C= D.*b;

x0=sparse(zeros(length(A),1));
x = B*x0 + C;
Niter = 1; 
TOLX = TOL;  

while(norm(x-x0,Inf) > norm(x0,Inf)*TOLX && Niter < MAXITER) 
    if(TOL*norm(x,Inf) > realmin)
        TOLX = norm(x,Inf)*TOL;
    else
        TOLX = realmin;
    end    

    x0 = x;
    x = B*x0 + C;

    Niter=Niter+1;
end
end

我用 timeit 函数对代码进行了计时,结果如下(输入与前一个相同):

4名工人:11.693473075964102

3名工人:9.221281335264003

2名工人:9.150417240778545

1名工人:6.047181992020434

顺序:0.002893932969688

4

0 回答 0