c++ - 不同深度图像的 Tbb 并行卷积

Question

我想使用 tbb（parallel_for 模式）同时卷积大量图像 - 每个处理器的核心convolves单个图像。但是，图像的深度会有所不同：要么monograyscale(1-channel)，要么stereograyscale(2-channel)，要么monorgb(3-channel)，要么stereorgb(6-channel)，等等。

事实证明，不同线程（核心）上的工作负载不断变化。如何在此任务中正确使用 parallel_for，或者我应该考虑其他并行模式？

score 1 · Accepted Answer

形式为 parallel_for(first,last,lambda) 的 tbb::parallel_for 做了一些负载平衡。你可以先试试。虽然它有一个启发式方法来猜测一个有时会被愚弄的好粒度。

为了获得最佳负载平衡，可能以额外的每次迭代开销为代价，使用粒度为 1 的基于范围的 tbb::parallel_for 和 simple_partitioner。这迫使每次迭代都作为单独的任务运行，从而为 TBB 运行时提供最大的灵活性来重新平衡负载。下面是一个执行 100 次迭代的示例，每次迭代都有一个随机延迟。

#include <tbb/parallel_for.h>
#include <unistd.h>

int main( int argc, char* argv[] ) {
    tbb::parallel_for(
        tbb::blocked_range<int>(0,100,1),  // Interval [0,100) with grainsize==1
        [&](tbb::blocked_range<int> r) {
            for( int i=r.begin(); i!=r.end(); ++i ) {
                printf("%d\n",i);
                usleep(random()%1000000);
            }
        },
        tbb::simple_partitioner());
}

c++ - 不同深度图像的 Tbb 并行卷积

1 回答 1

Related

Reference