gpgpu - 金属计算功能限制

Question

我经历过MTLBuffer计算密集型着色器函数往往会在所有threadgroups完成之前停止计算。当我使用MTLComputePipelineStateandMTLComputeCommandEncoder来模糊具有非常大的模糊半径的图像时，生成的图像被处理了一半，实际上可以看到一半完成threadgroups。我没有将其缩小到模糊半径的确切数量，但 16 像素可以正常工作，32 已经太多了，甚至没有计算出一半的组。

那么，着色器函数调用需要多长时间才能完成或类似的事情是否有任何限制？我刚刚完成了关于如何使用 Metal 框架的大部分文档，我不记得偶然发现过任何这样的陈述。

编辑

因为在我的情况下，问题不是简单的超时，而是一些内部错误，我将添加一些代码。

最昂贵的部分是块匹配算法，它在两个图像（即电影中的连续帧）中找到匹配块

//Exhaustive Search Block-matching algorithm
kernel void naiveMotion(
    texture2d<float,access::read>   inputImage1   [[ texture(0) ]],
    texture2d<float,access::read>   inputImage2   [[ texture(1) ]],
    texture2d<float,access::write>  outputImage  [[ texture(2) ]],
uint2 gid                                    [[ thread_position_in_grid ]]
)
{
    //area to search for matches
    float searchSize = 10.0;
    int searchRadius = searchSize/2;

    //window size to search in
    int kernelSize = 6;
    int kernelRadius = kernelSize/2;

    //this will store the motion direction
    float2 vector = float2(0.0,0.0);
    float2 maxVector = float2(searchSize,searchSize/2);
    float maxVectorLength = length(maxVector);

    //maximum error caused by noise
    float error = kernelSize*kernelSize*(10.0/255.0);


    for (int y = -searchRadius; y < searchRadius; ++y)
    {
        for (int x = 0; x < searchSize; ++x)
        {
            float diff = 0;
        
            for (int b = - kernelRadius; b < kernelRadius; ++b)
            {
                for (int a = - kernelRadius; a < kernelRadius; ++a)
                {
                    uint2 textureIndex(gid.x + x + a, gid.y + y + b);
                    float4 targetColor = inputImage2.read(textureIndex).rgba;
                    float4 referenceColor = inputImage1.read(gid).rgba;
                    float targetGray = 0.299*targetColor.r + 0.587*targetColor.g + 0.114*targetColor.b;
                    float referenceGray = 0.299*referenceColor.r + 0.587*referenceColor.g + 0.114*referenceColor.b;
                    diff = diff + abs(targetGray - referenceGray);
                }
            }
        
            if ( error > diff )
            {
                error = diff;
                //vertical motion is rather irrelevant but negative values can't be stored so just take the absolute value
                vector = float2(x, abs(y));
            }
        }
    }

    float intensity = length(vector)/maxVectorLength;
    outputImage.write(float4(normalize(vector), intensity, 1),gid);
}

我在 960x540px 图像上使用该着色器。使用searchSize9 和kernelSize8 时，着色器会在整个图像上运行。将 searchSize 更改为 10，着色器将提前停止并显示错误代码 1。

gpgpu - 金属计算功能限制

编辑

0 回答 0

Related

Reference