我正在尝试在并行区域内并行化内部 for 循环:
int num_threads, num_procs,max_row;
double start_time;
int i,h;
# pragma omp parallel default(none) \
shared(A,x,b,n,num_threads,num_procs,start_time,max_row)
{
int id = omp_get_thread_num();
if (id == 0)
{
num_threads = omp_get_num_threads();
num_procs = omp_get_num_procs();
printf("Number of processors: %d\n", num_procs);
printf("Actual number of threads: %d\n", num_threads);
start_time = omp_get_wtime();
}
for (int i = 0; i < n-1; i++) // for col 0 ... n-2
{
// partial pivoting
max_row = find_max(A,n,i);
if (i != max_row)
{
swap_rows(A,n,b,i,max_row);
}
# pragma omp for
for (int j = i+1; j < n; j++) // for each elt under pivot
{
double scalar = A[j*n+i] / A[i*n+i];
for (int k = 0; k < n; k++) // update values across row
{
A[j*n+k] -= (scalar * A[i*n+k]);
}
b[j] -= (scalar * b[i]);
}
// implicit barrier
}
for (int h = n-1; h >= 0; h--)
{
x[h] = b[h] / A[h*n+h];
#pragma omp for
for (int l = h-1; l >= 0; l--)
{
b[l] -= A[l*n+h] * x[h];
}
// implicit barrier
}
} // end parallel region
我不明白如何控制线程,以便只有一个线程计算外部 for 循环头,而不是所有线程。让所有线程都执行外部 for 循环头不是效率低下吗?我的书建议最好有一个并行区域而不是两个#pragma omp parallel for 语句,但是当您在并行区域内有这么多代码时会令人困惑。