我经历过MTLBuffer
计算密集型着色器函数往往会在所有threadgroups
完成之前停止计算。当我使用MTLComputePipelineState
andMTLComputeCommandEncoder
来模糊具有非常大的模糊半径的图像时,生成的图像被处理了一半,实际上可以看到一半完成threadgroups
。我没有将其缩小到模糊半径的确切数量,但 16 像素可以正常工作,32 已经太多了,甚至没有计算出一半的组。
那么,着色器函数调用需要多长时间才能完成或类似的事情是否有任何限制?我刚刚完成了关于如何使用 Metal 框架的大部分文档,我不记得偶然发现过任何这样的陈述。
编辑
因为在我的情况下,问题不是简单的超时,而是一些内部错误,我将添加一些代码。
最昂贵的部分是块匹配算法,它在两个图像(即电影中的连续帧)中找到匹配块
//Exhaustive Search Block-matching algorithm
kernel void naiveMotion(
texture2d<float,access::read> inputImage1 [[ texture(0) ]],
texture2d<float,access::read> inputImage2 [[ texture(1) ]],
texture2d<float,access::write> outputImage [[ texture(2) ]],
uint2 gid [[ thread_position_in_grid ]]
)
{
//area to search for matches
float searchSize = 10.0;
int searchRadius = searchSize/2;
//window size to search in
int kernelSize = 6;
int kernelRadius = kernelSize/2;
//this will store the motion direction
float2 vector = float2(0.0,0.0);
float2 maxVector = float2(searchSize,searchSize/2);
float maxVectorLength = length(maxVector);
//maximum error caused by noise
float error = kernelSize*kernelSize*(10.0/255.0);
for (int y = -searchRadius; y < searchRadius; ++y)
{
for (int x = 0; x < searchSize; ++x)
{
float diff = 0;
for (int b = - kernelRadius; b < kernelRadius; ++b)
{
for (int a = - kernelRadius; a < kernelRadius; ++a)
{
uint2 textureIndex(gid.x + x + a, gid.y + y + b);
float4 targetColor = inputImage2.read(textureIndex).rgba;
float4 referenceColor = inputImage1.read(gid).rgba;
float targetGray = 0.299*targetColor.r + 0.587*targetColor.g + 0.114*targetColor.b;
float referenceGray = 0.299*referenceColor.r + 0.587*referenceColor.g + 0.114*referenceColor.b;
diff = diff + abs(targetGray - referenceGray);
}
}
if ( error > diff )
{
error = diff;
//vertical motion is rather irrelevant but negative values can't be stored so just take the absolute value
vector = float2(x, abs(y));
}
}
}
float intensity = length(vector)/maxVectorLength;
outputImage.write(float4(normalize(vector), intensity, 1),gid);
}
我在 960x540px 图像上使用该着色器。使用searchSize
9 和kernelSize
8 时,着色器会在整个图像上运行。将 searchSize 更改为 10,着色器将提前停止并显示错误代码 1。