I have a plain code:
double eps;
A[N][N][N];
...
for(i=1; i<=N-2; i++)
for(j=1; j<=N-2; j++)
for(k=1; k<=N-2; k++)
{
A[i][j][k] = (A[i-1][j][k]+A[i+1][j][k])/2.;
}
for(i=1; i<=N-2; i++)
for(j=1; j<=N-2; j++)
for(k=1; k<=N-2; k++)
{
A[i][j][k] = (A[i][j-1][k]+A[i][j+1][k])/2.;
}
for(i=1; i<=N-2; i++)
for(j=1; j<=N-2; j++)
for(k=1; k<=N-2; k++)
{
double e;
e=A[i][j][k];
A[i][j][k] = (A[i][j][k-1]+A[i][j][k+1])/2.;
eps=Max(eps,fabs(e-A[i][j][k]));
}
And i need to make a parallel code with usage MPI.
Ok, i understand, what to do with eps - it is global variable, that i need to compute everywhere. so, i create local variable, compute it and return result from each node. Or make reduce.
But what to do with matrix A? It must be shared by every node.
How to synchronize every triple for construction? (if use see, that current A[i][j][k]-element is calculated with usage his neighbors - left and right A[i-1][][] A[i+1][][] or top and bottom A[][j+1][] A[][j-1][] or front and back A[][][k-1] A[][][k+1])
Thank you!
First Edition:
My first solution is to replace for constructions to minimize dependency from indexes such this:
for(j=1; j<=N-2; j++)
for(k=1; k<=N-2; k++)
//MPI here, Send processor (j,k) - coordinates of vector to compute next statement
for(i=1; i<=N-2; i++)
{
A[i][j][k] = (A[i-1][j][k]+A[i+1][j][k])/2.;
}
and so on:
for(i=1; i<=N-2; i++)
for(k=1; k<=N-2; k++)
for(j=1; j<=N-2; j++)
//here (i,k) is free dimensions, dependency only from j. send vector(i,k) to every processor
{
A[i][j][k] = (A[i][j-1][k]+A[i][j+1][k])/2.;
}
for(i=1; i<=N-2; i++)
for(j=1; j<=N-2; j++)
for(k=1; k<=N-2; k++)
//dependency only from k, (i,j) are free. send it to processor
{
double e;
e=A[i][j][k];
A[i][j][k] = (A[i][j][k-1]+A[i][j][k+1])/2.;
eps=Max(eps,fabs(e-A[i][j][k]));
}