我有一个看起来像这样的着色器:
void main( in float2 pos : TEXCOORD0,
in uniform sampler2D data : TEXUNIT0,
in uniform sampler2D palette : TEXUNIT1,
in uniform float c,
in uniform float th0,
in uniform float th1,
in uniform float th2,
in uniform float4 BackGroundColor,
out float4 color : COLOR
)
{
const float4 dataValue = tex2D( data, pos );
const float vValue = dataValue.x;
const float tValue = dataValue.y;
color = BackGroundColor;
if ( tValue <= th2 )
{
if ( tValue < th1 )
{
const float vRealValue = abs( vValue - 0.5 );
if ( vRealValue > th0 )
{
// determine value and color
const float power = ( c > 0.0 ) ? vValue : ( 1.0 - vValue );
color = tex2D( palette, float2( power, 0.0 ) );
}
}
else
{
color = float4( 0.0, tValue, 0.0, 1.0 );
}
}
}
我正在编译它:
cgc -profile arbfp1 -strict -O3 -q sh.cg -o sh.asm
现在,不同版本的 Cg 编译器创建不同的输出。
cgc 版本 2.2.0006 正在使用 18 条指令将着色器编译成汇编代码:
!!ARBfp1.0 PARAM c[6] = { program.local[0..4],{ 0, 1, 0.5 } }; TEMP R0; TEMP R1; TEMP R2; TEX R0.xy, fragment.texcoord[0], texture[0], 2D; ADD R0.z, -R0.x, c[5].y; CMP R0.z, -c[0].x, R0.x, R0; MOV R0.w, c[5].x; TEX R1, R0.zwzw, texture[1], 2D; SLT R0.z, R0.y, c[2].x; ADD R0.x, R0, -c[5].z; ABS R0.w, R0.x; SGE R0.x, c[3], R0.y; MUL R2.x, R0, R0.z; SLT R0.w, c[1].x, R0; ABS R2.y, R0.z; MUL R0.z, R2.x, R0.w; CMP R0.w, -R2.y, c[5].x, c[5].y; CMP R1, -R0.z, R1, c[4]; MUL R2.x, R0, R0.w; MOV R0.xzw, c[5].xyxy; CMP result.color, -R2.x, R0, R1; END # 18 instructions, 3 R-regs
cgc 版本 3.0.0016 正在使用 23 条指令将着色器编译成汇编代码:
!!ARBfp1.0 PARAM c[6] = { program.local[0..4], { 0, 1, 0.5 } }; TEMP R0; TEMP R1; TEMP R2; TEX R0.xy, fragment.texcoord[0], texture[0], 2D; ADD R1.y, R0.x, -c[5].z; MOV R1.z, c[0].x; ABS R1.y, R1; SLT R1.z, c[5].x, R1; SLT R1.x, R0.y, c[2]; SGE R0.z, c[3].x, R0.y; MUL R0.w, R0.z, R1.x; SLT R1.y, c[1].x, R1; MUL R0.w, R0, R1.y; ABS R1.z, R1; CMP R1.y, -R1.z, c[5].x, c[5]; MUL R1.y, R0.w, R1; ADD R1.z, -R0.x, c[5].y; CMP R1.z, -R1.y, R1, R0.x; ABS R0.x, R1; CMP R0.x, -R0, c[5], c[5].y; MOV R1.w, c[5].x; TEX R1, R1.zwzw, texture[1], 2D; CMP R1, -R0.w, R1, c[4]; MUL R2.x, R0.z, R0; MOV R0.xzw, c[5].xyxy; CMP result.color, -R2.x, R0, R1; END # 23 instructions, 3 R-regs
奇怪的是,cg 3.0 的优化级别似乎没有任何影响。
有人可以解释发生了什么吗?为什么优化不起作用,为什么当我使用 cg 3.0 编译时着色器更长?
请注意,我从已编译的着色器中删除了注释。