是否可以std::inner_product()
从 C++ 与omp.h
库并行化?不幸的是,我无法__gnu_parallel::inner_product()
在较新版本的 gcc 中使用 available 。我知道我可以实现自己的inner_product
并将其并行化,但我想使用标准方法。
问问题
432 次
2 回答
2
简短的回答:没有。
像这样的算法的全部意义inner_product
在于它们将循环从你身上抽象出来。但为了并行化算法,您需要并行化该循环——通过#pragma omp parallel for
或通过并行部分。这两种方法在本质上都与代码结构中的循环相关联,因此即使循环可并行化(很可能),您也需要将 OpenMP pragma放入函数中以对其应用并行性。
于 2012-12-07T13:29:57.617 回答
2
跟进 Hristo 的评论,您可以通过在线程inner_product
上分解数组,调用每个子数组,然后使用某种归约操作来组合子结果来做到这一点
#include <iostream>
#include <numeric>
#include <omp.h>
#include <sys/time.h>
void tick(struct timeval *t);
double tock(struct timeval *t);
int main (int argc, char **argv) {
const long int nelements=1000000;
long int *a = new long int[nelements];
long int *b = new long int[nelements];
int nthreads;
long int sum = 0;
struct timeval t;
double time;
#pragma omp parallel for
for (long int i=0; i<nelements; i++) {
a[i] = i+1;
b[i] = 1;
}
tick(&t);
#pragma omp parallel
#pragma omp single
nthreads = omp_get_num_threads();
#pragma omp parallel default(none) reduction(+:sum) shared(a,b,nthreads)
{
int tid = omp_get_thread_num();
int nitems = nelements/nthreads;
int start = tid*nitems;
int end = start + nitems;
if (tid == nthreads-1) end = nelements;
sum += std::inner_product( &(a[start]), a+end, &(b[start]), 0L);
}
time = tock(&t);
std::cout << "using omp: sum = " << sum << " time = " << time << std::endl;
delete [] a;
delete [] b;
a = new long int[nelements];
b = new long int[nelements];
sum = 0;
for (long int i=0; i<nelements; i++) {
a[i] = i+1;
b[i] = 1;
}
tick(&t);
sum = std::inner_product( a, a+nelements, b, 0L);
time = tock(&t);
std::cout << "single threaded: sum = " << sum << " time = " << time << std::endl;
std::cout << "correct answer: sum = " << (nelements)*(nelements+1)/2 << std::endl ;
delete [] a;
delete [] b;
return 0;
}
void tick(struct timeval *t) {
gettimeofday(t, NULL);
}
/* returns time in seconds from now to time described by t */
double tock(struct timeval *t) {
struct timeval now;
gettimeofday(&now, NULL);
return (double)(now.tv_sec - t->tv_sec) + ((double)(now.tv_usec - t->tv_usec)/1000000.);
}
运行它得到的加速比我预期的要好:
$ for NT in 1 2 4 8; do export OMP_NUM_THREADS=${NT}; echo; echo "NTHREADS=${NT}";./inner; done
NTHREADS=1
using omp: sum = 500000500000 time = 0.004675
single threaded: sum = 500000500000 time = 0.004765
correct answer: sum = 500000500000
NTHREADS=2
using omp: sum = 500000500000 time = 0.002317
single threaded: sum = 500000500000 time = 0.004773
correct answer: sum = 500000500000
NTHREADS=4
using omp: sum = 500000500000 time = 0.001205
single threaded: sum = 500000500000 time = 0.004758
correct answer: sum = 500000500000
NTHREADS=8
using omp: sum = 500000500000 time = 0.000617
single threaded: sum = 500000500000 time = 0.004784
correct answer: sum = 500000500000
于 2012-12-07T14:48:18.253 回答