我自己已经完成了这件事,我在这里看到了一些可以优化的东西。
首先,我会删除enableTexture
条件,而是将您的着色器分成两个程序,一个用于此的真实状态,一个用于错误。条件在 iOS 片段着色器中非常昂贵,尤其是在其中具有纹理读取的那些。
其次,这里有九个依赖纹理读取。这些是纹理读取,其中纹理坐标在片段着色器中计算。在 iOS 设备中的 PowerVR GPU 上,依赖纹理读取非常昂贵,因为它们会阻止硬件使用缓存等优化纹理读取。因为您从 8 个周围像素和一个中心像素的固定偏移量进行采样,所以这些计算应该是向上移动到顶点着色器。这也意味着不必为每个像素执行这些计算,只需为每个顶点执行一次,然后硬件插值将处理其余部分。
第三,到目前为止,iOS 着色器编译器还没有很好地处理 for() 循环,所以我倾向于尽可能避免使用那些。
正如我所提到的,我已经在我的开源 iOS GPUImage框架中完成了这样的卷积着色器。对于通用卷积过滤器,我使用以下顶点着色器:
attribute vec4 position;
attribute vec4 inputTextureCoordinate;
uniform highp float texelWidth;
uniform highp float texelHeight;
varying vec2 textureCoordinate;
varying vec2 leftTextureCoordinate;
varying vec2 rightTextureCoordinate;
varying vec2 topTextureCoordinate;
varying vec2 topLeftTextureCoordinate;
varying vec2 topRightTextureCoordinate;
varying vec2 bottomTextureCoordinate;
varying vec2 bottomLeftTextureCoordinate;
varying vec2 bottomRightTextureCoordinate;
void main()
{
gl_Position = position;
vec2 widthStep = vec2(texelWidth, 0.0);
vec2 heightStep = vec2(0.0, texelHeight);
vec2 widthHeightStep = vec2(texelWidth, texelHeight);
vec2 widthNegativeHeightStep = vec2(texelWidth, -texelHeight);
textureCoordinate = inputTextureCoordinate.xy;
leftTextureCoordinate = inputTextureCoordinate.xy - widthStep;
rightTextureCoordinate = inputTextureCoordinate.xy + widthStep;
topTextureCoordinate = inputTextureCoordinate.xy - heightStep;
topLeftTextureCoordinate = inputTextureCoordinate.xy - widthHeightStep;
topRightTextureCoordinate = inputTextureCoordinate.xy + widthNegativeHeightStep;
bottomTextureCoordinate = inputTextureCoordinate.xy + heightStep;
bottomLeftTextureCoordinate = inputTextureCoordinate.xy - widthNegativeHeightStep;
bottomRightTextureCoordinate = inputTextureCoordinate.xy + widthHeightStep;
}
和以下片段着色器:
precision highp float;
uniform sampler2D inputImageTexture;
uniform mediump mat3 convolutionMatrix;
varying vec2 textureCoordinate;
varying vec2 leftTextureCoordinate;
varying vec2 rightTextureCoordinate;
varying vec2 topTextureCoordinate;
varying vec2 topLeftTextureCoordinate;
varying vec2 topRightTextureCoordinate;
varying vec2 bottomTextureCoordinate;
varying vec2 bottomLeftTextureCoordinate;
varying vec2 bottomRightTextureCoordinate;
void main()
{
mediump vec4 bottomColor = texture2D(inputImageTexture, bottomTextureCoordinate);
mediump vec4 bottomLeftColor = texture2D(inputImageTexture, bottomLeftTextureCoordinate);
mediump vec4 bottomRightColor = texture2D(inputImageTexture, bottomRightTextureCoordinate);
mediump vec4 centerColor = texture2D(inputImageTexture, textureCoordinate);
mediump vec4 leftColor = texture2D(inputImageTexture, leftTextureCoordinate);
mediump vec4 rightColor = texture2D(inputImageTexture, rightTextureCoordinate);
mediump vec4 topColor = texture2D(inputImageTexture, topTextureCoordinate);
mediump vec4 topRightColor = texture2D(inputImageTexture, topRightTextureCoordinate);
mediump vec4 topLeftColor = texture2D(inputImageTexture, topLeftTextureCoordinate);
mediump vec4 resultColor = topLeftColor * convolutionMatrix[0][0] + topColor * convolutionMatrix[0][1] + topRightColor * convolutionMatrix[0][2];
resultColor += leftColor * convolutionMatrix[1][0] + centerColor * convolutionMatrix[1][1] + rightColor * convolutionMatrix[1][2];
resultColor += bottomLeftColor * convolutionMatrix[2][0] + bottomColor * convolutionMatrix[2][1] + bottomRightColor * convolutionMatrix[2][2];
gl_FragColor = resultColor;
}
texelWidth
和texelHeight
制服是输入图像的宽度和高度的倒数,制服convolutionMatrix
指定卷积中各种样本的权重。
在 iPhone 4 上,对于 640x480 帧的相机视频,这在 4-8 毫秒内运行,这对于以该图像大小进行 60 FPS 渲染来说已经足够了。如果您只需要做边缘检测之类的事情,您可以简化上述操作,在预通道中将图像转换为亮度,然后仅从一个颜色通道中采样。这甚至更快,在同一设备上每帧大约 2 毫秒。