c++ - 在 parallel_for 循环中为每个线程分配内存

Question

我最初有一个单线程循环，它遍历图像的所有像素，并且可以对数据进行各种操作。

我使用的库规定必须一次一行地从图像中检索像素。为此，我malloc有一块内存可以容纳一行像素（BMM_Color_fl是一个包含一个像素的 RGBA 数据作为四个浮点值的结构，并将GetLinearPixels()一行像素从位图中复制到BMM_Color_fl数组中。）

BMM_Color_fl* line = (BMM_Color_fl*)malloc(width * sizeof(BMM_Color_fl));
for (int y = 0; y < height, y++)
{   
    bmp->GetLinearPixels(0, y, width, line); //Copy data of row Y from bitmap into line.
    BMM_Color_fl* pixel = line; //Get first pixel of line.
    for (int x = 0; x < width; x++, pixel++) // For each pixel in the row...
    {
        //Do stuff with a pixel.
    }
}
free(line);

到现在为止还挺好！

为了减少这个循环的执行时间，我用编写了一个并发版本parallel_for，它看起来像这样：

parallel_for(0, height, [&](int y)
{   
    BMM_Color_fl* line = (BMM_Color_fl*)malloc(width * sizeof(BMM_Color_fl));
    bmp->GetLinearPixels(0, y, width, line);
    BMM_Color_fl* pixel = line;
    for (int x = 0; x < width; x++, pixel++)
    {
        //Do stuff with a pixel.
    }
    free(line);
});

虽然多线程循环已经比原来快了，但我意识到不可能所有线程都使用相同的内存块，所以目前我在每次循环迭代时分配和释放内存，这显然是浪费，因为永远不会有更多线程比循环迭代。

我的问题是我是否以及如何让每个线程malloc恰好有一个行缓冲区并重复使用它（理想情况下，最后释放它）？

作为免责声明，我必须声明我是 C++ 新手。

实施建议的解决方案：

Concurrency::combinable<std::vector<BMM_Color_fl>> line;

parallel_for(0, height, [&] (int y)
{
    std::vector<BMM_Color_fl> lineL = line.local();
    if (lineL.capacity() < width) lineL.reserve(width);

    bmp->GetLinearPixels(0, y, width, &lineL[0]);

    for (int x = 0; x < width; x++)
    {
         BMM_Color_fl* pixel = &lineL[x];
         //Do stuff with a pixel.
    }       
});

如建议的那样，我将其装罐malloc并用vector+替换了它reserve。

score 0 · Accepted Answer

不要让每个线程调用parallel_for()，而是让它们调用另一个分配内存的函数，调用parallel_for()，然后释放内存。

score 0 · Accepted Answer

您可以使用Concurrency::combinable类来实现这一点。我懒得发布代码，但我相信这是可能的。

c++ - 在 parallel_for 循环中为每个线程分配内存

2 回答 2

Related

Reference