I have a few nested loops and I put the first one in parallel mode. apar
and mpar
are structs whose values are modified in the loop and then function breakLogic
is called which generates a struct which i store in a pre created vector of those structs.
one, two ... have been declared earlier in the function.
I have tried to include ordered and critical to ensure accuracy but i am still getting incorrect results.
#pragma omp parallel for ordered private(appFlip, atur, apar, mpar, i, j, k, l, m, n) shared(rawFlip)
for(i=0; i<oneL; i++)
{
initialize mpar
#pragma omp critical
apar.one = one[i];
for(j=0; j<twoL; j++)
{
apar.two = two[j];
for(k=0; k<threeL; k++)
{
apar.three = floor(three[k]*apar.two);
appFlip = applyParamSin(rawFlip, apar);
for(l=0; l< fourL; l++)
{
mpar.four = four[l];
for(m=0; m<fiveL; m++)
{
mpar.five = five[m];
for(n=0; n<sixL; n++)
{
mpar.six = add[n];
atur = breakLogic(appFlip, mpar, dt);
#pragma omp ordered
{
sinResVec[itr] = atur;
itr++;
}
}
}
}
r0(appFlip);
}
}
}
Or is this code not conducive for parallelism? Are there any tools for g++ which can profile code for parallel processing and indicate potential issues?
This modified code works but gives no performance improvement.