I have a plain code:
double eps;
A[N][N][N];
...
for(i=1; i<=N-2; i++)
for(j=1; j<=N-2; j++)
for(k=1; k<=N-2; k++)
{
A[i][j][k] = (A[i-1][j][k]+A[i+1][j][k])/2.;
}
for(i=1; i<=N-2; i++)
for(j=1; j<=N-2; j++)
for(k=1; k<=N-2; k++)
{
A[i][j][k] = (A[i][j-1][k]+A[i][j+1][k])/2.;
}
for(i=1; i<=N-2; i++)
for(j=1; j<=N-2; j++)
for(k=1; k<=N-2; k++)
{
double e;
e=A[i][j][k];
A[i][j][k] = (A[i][j][k-1]+A[i][j][k+1])/2.;
eps=Max(eps,fabs(e-A[i][j][k]));
}
And i need to make a parallel code with usage MPI.
Ok, i understand, what to do with eps
- it is global variable, that i need to compute everywhere. so, i create local variable, compute it and return result from each node. Or make reduce.
But what to do with matrix A
? It must be shared by every node.
How to synchronize every triple for
construction? (if use see, that current A[i][j][k]
-element is calculated with usage his neighbors - left and right A[i-1][][] A[i+1][][]
or top and bottom A[][j+1][] A[][j-1][]
or front and back A[][][k-1] A[][][k+1]
)
Thank you!
First Edition:
My first solution is to replace for
constructions to minimize dependency from indexes such this:
for(j=1; j<=N-2; j++)
for(k=1; k<=N-2; k++)
//MPI here, Send processor (j,k) - coordinates of vector to compute next statement
for(i=1; i<=N-2; i++)
{
A[i][j][k] = (A[i-1][j][k]+A[i+1][j][k])/2.;
}
and so on:
for(i=1; i<=N-2; i++)
for(k=1; k<=N-2; k++)
for(j=1; j<=N-2; j++)
//here (i,k) is free dimensions, dependency only from j. send vector(i,k) to every processor
{
A[i][j][k] = (A[i][j-1][k]+A[i][j+1][k])/2.;
}
for(i=1; i<=N-2; i++)
for(j=1; j<=N-2; j++)
for(k=1; k<=N-2; k++)
//dependency only from k, (i,j) are free. send it to processor
{
double e;
e=A[i][j][k];
A[i][j][k] = (A[i][j][k-1]+A[i][j][k+1])/2.;
eps=Max(eps,fabs(e-A[i][j][k]));
}