c++ - 自定义 glBlendFunc 比原生慢很多

Question

我正在尝试通过片段着色器执行我自己的自定义 glBlendFunc，但是，我的解决方案比本机 glBlendFunc 慢很多，即使它们执行精确的混合功能也是如此。

我想知道是否有人对如何以更有效的方式执行此操作有任何建议。

我的解决方案是这样的：

void draw(fbo fbos[2], render_item item)
{
   // fbos[0] is the render target
   // fbos[1] is the previous render target used to read "background" to blend against in shader
   // Both fbos have exactly the same content, however they need to be different since we can't both read and write to the same texture. The texture we render to needs to have the entire content since we might not draw geometry everywhere.

   fbos[0]->attach(); // Attach fbo
   fbos[1]->bind(1); // Bind as texture 1

   render(item);

   glCopyTexSubImage2D(...); // copy from fbos[0] to fbos[1], fbos[1] == fbos[0]
}

片段.glsl

vec4 blend_color(vec4 fore) 
{   
    vec4 back = texture2D(background, gl_TexCoord[1].st); // background is read from texture "1"
    return vec4(mix(back.rgb, fore.rgb, fore.a), back.a + fore.a);  
}

score 3 · Accepted Answer

提高基于 FBO 的混合性能的最佳选择是NV_texture_barrier。尽管有这个名字，AMD 也实现了它，所以如果你坚持使用 Radeon HD 级卡，它应该可供你使用。

基本上，它允许您在没有 FBO 绑定或纹理附加操作等重量级操作的情况下进行乒乓球运动。该规范在底部有一个部分显示了一般算法。

另一种选择是EXT_shader_image_load_store。这将需要 DX11/GL 4.x 类硬件。OpenGL 4.2 最近通过ARB_shader_image_load_store将此提升为核心。

即便如此，正如达西所说，你永远无法击败常规混合。它使用着色器无法访问的特殊硬件结构（因为它们发生在着色器运行之后）。如果有一些你绝对无法以其他方式完成的效果，你应该只进行程序化混合。

score 2 · Accepted Answer

它的效率要高得多，因为混合操作直接内置于 GPU 硬件中，因此您可能无法在速度上击败它。话虽如此，请确保您已关闭深度测试、背面剔除、硬件混合和任何其他不需要的操作。我不能说它会产生很大的不同，但它可能会产生一些影响。

c++ - 自定义 glBlendFunc 比原生慢很多

2 回答 2

Related

Reference