c++ - 垂直翻转 Char 数组：有没有更有效的方法？

Question

让我们从一些代码开始：

QByteArray OpenGLWidget::modifyImage(QByteArray imageArray, const int width, const int height){
    if (vertFlip){
        /* Each pixel constist of four unisgned chars: Red Green Blue Alpha.
         * The field is normally 640*480, this means that the whole picture is in fact 640*4 uChars wide.
         * The whole ByteArray is onedimensional, this means that 640*4 is the red of the first pixel of the second row
         * This function is EXTREMELY SLOW
         */
        QByteArray tempArray = imageArray;
        for (int h = 0; h < height; ++h){
            for (int w = 0; w < width/2; ++w){
                for (int i = 0; i < 4; ++i){
                    imageArray.data()[h*width*4 + 4*w + i] = tempArray.data()[h*width*4 + (4*width - 4*w) + i ];
                    imageArray.data()[h*width*4 + (4*width - 4*w) + i] = tempArray.data()[h*width*4 + 4*w + i];
                }
            }
        }
    }
    return imageArray;
}

这是我现在用来垂直翻转 640*480 图像的代码（图像实际上不能保证是 640*480，但大多数情况下是这样）。颜色编码为RGBA，即数组总大小为640*480*4。我得到了 30 FPS 的图像，我想以相同的 FPS 在屏幕上显示它们。

在较旧的 CPU (Athlon x2) 上，此代码太多了：CPU 正在竞相跟上 30 FPS，所以问题是：我可以更高效地执行此操作吗？

我也在使用 OpenGL，这是否有一个我不知道的噱头，可以翻转 CPU/GPU 使用率相对较低的图像？

score 2 · Accepted Answer

根据这个问题，您可以在 OpenGL 中通过缩放来翻转图像(1,-1,1)。这个问题解释了如何进行转换和缩放。

score 1 · Accepted Answer

您至少可以通过使用缓存架构逐块进行改进。在您的示例中，其中一种访问（读取或写入）将不在缓存中。

score 0 · Accepted Answer

首先，如果您使用两个循环遍历图像的像素，它可以帮助“捕获扫描线”，如下所示：

for (int y = 0; y < height; ++y)
{
    // Capture scanline.
    char* scanline = imageArray.data() + y*width*4;

    for (int x = 0; x < width/2; ++x)
    {
        const int flipped_x = width - x-1;
        for (int i = 0; i < 4; ++i)
            swap(scanline[x*4 + i], scanline[flipped_x*4 + i]);
    }
}

另一件需要注意的是，我使用swap了一个临时图像而不是一个临时图像。这往往会更有效，因为您可以使用寄存器进行交换，而不是从整个图像的副本中加载像素。

但是，如果您要做这样的事情，如果您使用 32 位整数而不是一次处理一个字节，这通常也会有所帮助。如果您正在使用 8 位类型的像素，但知道每个像素都是 32 位，例如，在您的情况下，您通常可以摆脱的情况uint32_t*，例如

for (int y = 0; y < height; ++y)
{
    uint32_t* scanline = (uint32_t*)imageArray.data() + y*width;
    std::reverse(scanline, scanline + width);
}

此时，您可能会并行化y循环。以这种方式水平翻转图像（如果我正确理解您的原始代码，它应该是“水平的”）访问模式有点棘手，但是使用上述技术应该能够获得相当不错的提升。

我也在使用 OpenGL，这是否有一个我不知道的噱头，可以翻转 CPU/GPU 使用率相对较低的图像？

自然，翻转图像的最快方法是根本不触摸它们的像素，并在渲染结果时将翻转保存到管道的最后部分。为此，您可以在 OGL 中使用负缩放来渲染纹理，而不是修改纹理的像素。

在视频和图像处理中真正有用的另一件事是表示要为所有图像操作处理的图像：

struct Image32
{
     uint32_t* pixels;
     int32_t width;
     int32_t height;
     int32_t x_stride;
     int32_t y_stride;
};

步幅字段是您用来从图像的一个扫描线（行）垂直到下一个扫描线和从一列水平到下一个扫描线的内容。当您使用此表示时，您可以对步幅使用负值并相应地偏移像素。您还可以使用步幅字段，例如，仅渲染图像的每隔一个扫描线，以通过使用y_stride=height*2和进行快速交互式半分辨率扫描线预览height/=2。您可以通过将 x 步幅设置为 2 并将 y 步幅设置为 2*width 然后将宽度和高度减半来对图像进行四分之一分辨率。您可以在不让 blit 函数接受大量参数的情况下渲染裁剪图像，只需修改这些字段并将 y 步幅保持为宽度以从图像的裁剪部分的一行到下一行：

// Using the stride representation of Image32, this can now
// blit a cropped source, a horizontally flipped source, 
// a vertically flipped source, a source flipped both ways,
// a half-res source, a quarter-res source, a quarter-res
// source that is horizontally flipped and cropped, etc,
// and all without modifying the source image in advance
// or having to accept all kinds of extra drawing parameters.
void blit(int dst_x, int dst_y, Image32 dst, Image32 src);

// We don't have to do things like this (and I think I lost
// some capabilities with this version below but it hurts my 
// brain too much to think about what capabilities were lost):
void blit_gross(int dst_x, int dst_y, int dst_w, int dst_h, uint32_t* dst, 
                int src_x, int src_y, int src_w, int src_h, 
                const uint32_t* src, bool flip_x, bool flip_y);

通过使用负值并将其传递给图像操作（例如：blit 操作），结果自然会被翻转，而无需实际翻转图像。可以说，它最终会被“绘制翻转”，就像使用带有负缩放变换矩阵的 OGL 的情况一样。

c++ - 垂直翻转 Char 数组：有没有更有效的方法？

3 回答 3

Related

Reference