我正在尝试使用 OpenMP 中的任务来实现并行算法。并行编程模式是基于生产者-消费者的思想,但是由于消费者进程比生产者慢,所以我想用几个生产者和几个消费者。主要思想是创建与生产者一样多的操作系统线程,然后每个线程将创建要并行完成的任务(由消费者)。每个生产者都将与一定数量的消费者相关联(即 numCheckers/numSeekers)。我在英特尔双芯片服务器上运行算法,每个芯片有 6 个内核。问题是,当我只使用一个生产者(搜索者)和越来越多的消费者(检查者)时,性能会随着消费者数量的增加而迅速下降(见下表),即使正确数量的内核正在以100%。另一方面,如果我增加生产者的数量,平均时间会减少或至少保持稳定,即使消费者数量成比例。在我看来,所有的改进都是通过生产者之间的投入分配来实现的,而任务只是窃听而已。但同样,我对一位制片人的行为没有任何解释。我在 OpenMP-Task 逻辑中遗漏了什么吗?难道我做错了什么?
-------------------------------------------------------------------------
| producers | consumers | time |
-------------------------------------------------------------------------
| 1 | 1 | 0.642935 |
| 1 | 2 | 3.004023 |
| 1 | 3 | 5.332524 |
| 1 | 4 | 7.222009 |
| 1 | 5 | 9.472093 |
| 1 | 6 | 10.372389 |
| 1 | 7 | 12.671839 |
| 1 | 8 | 14.631013 |
| 1 | 9 | 14.500603 |
| 1 | 10 | 18.034931 |
| 1 | 11 | 17.835978 |
-------------------------------------------------------------------------
| 2 | 2 | 0.357881 |
| 2 | 4 | 0.361383 |
| 2 | 6 | 0.362556 |
| 2 | 8 | 0.359722 |
| 2 | 10 | 0.358816 |
-------------------------------------------------------------------------
我的代码的主要部分是休闲:
int main( int argc, char** argv) {
// ... process the input (read from file, etc...)
const char *buffer_start[numSeekers];
int buffer_len[numSeekers];
//populate these arrays dividing the input
//I need to do this because I need to overlap the buffers for
//correctness, so I simple parallel-for it's not enough
//Here is where I create the producers
int num = 0;
#pragma omp parallel for num_threads(numSeekers) reduction(+:num)
for (int i = 0; i < numSeekers; i++) {
num += seek(buffer_start[i], buffer_len[i]);
}
return (int*)num;
}
int seek(const char* buffer, int n){
int num = 0;
//asign the same number of consumers for each producer
#pragma omp parallel num_threads(numCheckers/numSeekers) shared(num)
{
//only one time for every producer
#pragma omp single
{
for(int pos = 0; pos < n; pos += STEP){
if (condition(buffer[pos])){
#pragma omp task shared(num)
{
//check() is a sequential function
num += check(buffer[pos]);
}
}
}
#pragma omp taskwait
}
return num;
}