I'm trying to parallelize a code. My code looks like this -
#pragma omp parallel private(i,j,k)
#pragma omp parallel for shared(A)
for(k=0;k<100;<k++)
for(i=1;i<1024;<i++)
for(j=0;j<1024;<j++)
A[i][j+1]=<< some expression involving elements of A[i-1][j-1] >>
On executing this code I'm getting a different result from serial execution of the loops. I'm unable to understand what I'm doing wrong.
I've also tried the collapse()
#pragma omp parallel private(i,j,k)
#pragma omp parallel for collapse(3) shared(A)
for(k=0;k<100;<k++)
for(i=1;i<1024;<i++)
for(j=0;j<1024;<j++)
A[i][j+1]=<< some expression involving elements of A[][] >>
Another thing I tried was having a #pragma omp parallel for before each loop instead of collapse().
The issue, as I think, is the data dependency. Any idea how to parallelize in case of data dependency?