c++ - GlTexSubImage2D 速度慢，占用 4% 的 CPU

Question

我正在将 glTexSubImage2D 用于使用 openGL 的更新窗口。

我看到这个函数需要很长时间才能返回，并且还需要 4% 的 CPU。

这是我使用的代码：

glEnable(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, (*i)->getTextureID());
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, (*i)->getWidth(), (*i)->getHeightView(),
    GL_BGRA, GL_UNSIGNED_BYTE,(*i)->getBuffer());

有人知道更好的实现吗？性能更好但占用 CPU 更少的东西？

现在这使我的程序变得迟缓。

score 6 · Accepted Answer

There are some things you can do, though how much you can benefit from them depends on the circumstances.

First, make sure that your pixel upload format is correct for the driver's needs. You seem to have that taken care of with GL_BGRA, GL_UNSIGNED_BYTE, which is likely the driver's preferred format for GL_RGBA8 image formats.

However, if you happen to have access to OpenGL 4.3 or a driver that implements ARB_internalformat_query2, you can actually detect at runtime what the preferred upload format will be. Like this:

GLint pixelFormat, pixelType;
glGetInternalFormativ(GL_TEXTURE_2D, GL_RGBA8, GL_TEXTURE_IMAGE_FORMAT, 1, &pixelFormat);
glGetInternalFormativ(GL_TEXTURE_2D, GL_RGBA8, GL_TEXTURE_IMAGE_TYPE, 1, &pixelType);

Of course, this means that you will need to be able to modify your data generation method to generate data in the above format/type pair.

Once you've taken steps to appease the driver, your next possibilities are using buffer objects to store your pixel transfer data. This probably won't help overall performance, but it can reduce the CPU burden.

However, in order to take the best advantage of this, you need to be able to generate your pixel data "directly" into the buffer object's memory by mapping it. If you are able to do this, then you can probably get back some of the CPU cost of the upload. Otherwise, it may not be worthwhile.

If you do this, you should use proper buffer object streaming techniques.

Double-buffering your texture may also help. That is, while you're rendering from one texture object, you're uploading to another one. This will prevent GPU stalls that wait for the prior rendering to complete. How much this helps really depends on how you're rendering.

Without knowing more about the specific circumstances of your application, there's not much more that can be said.

score 3 · Accepted Answer

如果您的纹理确实每帧都在变化，那么您将需要使用双缓冲区将数据传输到 GPU。（如果不是每一帧都改变，那么明显的优化就是只上传一次！）

每一帧，您将数据上传到一个缓冲区并从另一个缓冲区绘制数据，然后您切换每个帧使用的缓冲区。这将加速一切，因为 GPU 不必等待内存传输完成。

关于 PBO 的教程在某种程度上超出了我的回答能力，但“ OpenGL Pixel Buffer Objects ”是一个不错的参考，我会查看“ OGL Samples ”存储库以了解 PBO 是如何工作的。

但是，如果您不能提前计算纹理帧，那么使用 PBO 就没有真正的优势。只需使用glTexSubImage2D.

也就是说， 4% 的 CPU 可能不是问题。

score 1 · Accepted Answer

你不应该为了更新你的屏幕而改变每一帧的纹理数据。纹理意味着加载一次并且很少（如果有的话）更改。如果您尝试写入屏幕上的单个像素，我建议您不要使用 OpenGL，而是使用更适合该任务的东西，例如 SDL。

编辑：好的，这不一定是真的。请参阅下面的讨论。

score 0 · Accepted Answer

正如我从这个答案的评论线程中了解到的那样，您正在 CPU 端渲染一个网站（或渲染的图像通过 CPU），但对其应用 OpenGL 着色器。如果是这样，您需要一个 GPU 端渲染器，渲染网页并在 GPU 端应用着色器。这样一来，您将不再通过 CPU 将每一帧上传到 GPU，并且 CPU 将不再需要渲染，因为它原本应该是.

c++ - GlTexSubImage2D 速度慢，占用 4% 的 CPU

4 回答 4

Related

Reference