c++ - 巨大的性能差异：调试与发布

Question

我有一个简单的算法，它将拜耳图像通道（BGGR、RGGB、GBRG、GRBG）转换为 rgb（去马赛克但没有邻居）。在我的实现中，我预先设置了偏移向量，可以帮助我将拜耳通道索引转换为其相应的 rgb 通道索引。唯一的问题是我在使用 MSVC11 的调试模式下性能很差。在发布时，对于 3264X2540 大小的输入，该函数在 ~60ms 内完成。对于调试中的相同输入，该函数在约 20,000 毫秒内完成。这不仅仅是 X300 的差异，而且由于一些开发人员正在调试我的应用程序，这是不可接受的。

我的代码：

void ConvertBayerToRgbImageDemosaic(int* BayerChannel, int* RgbChannel, int Width, int 

Height, ColorSpace ColorSpace)
{
    int rgbOffsets[4]; //translates color location in Bayer block to it's location in RGB block. So R->0, G->1, B->2
    std::vector<int> bayerToRgbOffsets[4]; //the offsets from every color in the Bayer block to (bayer) indices it will be copied to (R,B are copied to all indices, Gr to R and Gb to B).
    //calculate offsets according to color space
    switch (ColorSpace)
    {
    case ColorSpace::BGGR:
            /*
             B G
             G R
            */ 
        rgbOffsets[0] = 2; //B->0
        rgbOffsets[1] = 1; //G->1
        rgbOffsets[2] = 1; //G->1
        rgbOffsets[3] = 0; //R->0
        //B is copied to every pixel in it's block
        bayerToRgbOffsets[0].push_back(0);
        bayerToRgbOffsets[0].push_back(1);
        bayerToRgbOffsets[0].push_back(Width);
        bayerToRgbOffsets[0].push_back(Width + 1);
        //Gb is copied to it's neighbouring B
        bayerToRgbOffsets[1].push_back(-1);
        bayerToRgbOffsets[1].push_back(0);
        //GR is copied to it's neighbouring R
        bayerToRgbOffsets[2].push_back(0);
        bayerToRgbOffsets[2].push_back(1);
        //R is copied to every pixel in it's block
        bayerToRgbOffsets[3].push_back(-Width - 1);
        bayerToRgbOffsets[3].push_back(-Width);
        bayerToRgbOffsets[3].push_back(-1);
        bayerToRgbOffsets[3].push_back(0);
        break;
    ... other color spaces
    }

    for (auto row = 0; row < Height; row++)
    {
        for (auto col = 0, bayerIndex = row * Width; col < Width; col++, bayerIndex++)
        {
            auto colorIndex = (row%2)*2 + (col%2); //0...3, For example in BGGR: 0->B, 1->Gb, 2->Gr, 3->R
            //iteration over bayerToRgbOffsets is O(1) since it is either sized 2 or 4.
            std::for_each(bayerToRgbOffsets[colorIndex].begin(), bayerToRgbOffsets[colorIndex].end(), 
                [&](int colorOffset)
                {
                    auto rgbIndex = (bayerIndex + colorOffset) * 3 + rgbOffsets[offset];
                    RgbChannel[rgbIndex] = BayerChannel[bayerIndex];
                });
        }
    }
}

我尝试过的：我尝试为调试版本启用优化 (/O2)，但没有显着差异。我尝试用普通的旧循环替换内部for_each语句，for但无济于事。我有一个非常相似的算法，它将拜耳转换为“绿色”rgb（不将数据复制到块中的相邻像素），其中我没有使用，std::vector并且在调试和发布之间存在预期的运行时差异（X2-X3 ）。那么，这可能std::vector是问题所在吗？如果是这样，我该如何克服它？

score 16 · Accepted Answer

在您使用std::vector时，它将有助于禁用迭代器调试。

MSDN 展示了如何做到这一点。

简单来说，#define在包含任何 STL 标头之前进行此操作：

#define _HAS_ITERATOR_DEBUGGING 0

根据我的经验，这极大地提高了调试构建的性能，尽管您当然会失去一些调试功能。

score 0 · Accepted Answer

在 VS 中，您可以使用以下设置进行调试，禁用 (/Od)。选择其他选项之一（最小尺寸 (/O1)、最大速度 (/O2)、完全优化 (/Ox) 或自定义）。除了 Roger Rowland 提到的迭代器优化......

c++ - 巨大的性能差异：调试与发布

2 回答 2

Related

Reference