ios - 如何使用使用金属的自定义计算着色器并获得非常流畅的性能？

Question

MPSKernal我尝试使用apple 和 custom 提供的默认过滤器通过金属应用实时相机过滤器compute Shaders。

在计算着色器中，我使用 MPSImageGaussianBlur 进行了就地编码，代码在这里

func encode(to commandBuffer: MTLCommandBuffer, sourceTexture: MTLTexture, destinationTexture: MTLTexture, cropRect: MTLRegion = MTLRegion.init(), offset : CGPoint) {

    let blur = MPSImageGaussianBlur(device: device, sigma: 0)
    blur.clipRect = cropRect
    blur.offset = MPSOffset(x: Int(offset.x), y: Int(offset.y), z: 0)

    let threadsPerThreadgroup = MTLSizeMake(4, 4, 1)
    let threadgroupsPerGrid = MTLSizeMake(sourceTexture.width / threadsPerThreadgroup.width, sourceTexture.height / threadsPerThreadgroup.height, 1)

    let commandEncoder = commandBuffer.makeComputeCommandEncoder()
    commandEncoder.setComputePipelineState(pipelineState!)
    commandEncoder.setTexture(sourceTexture, at: 0)
    commandEncoder.setTexture(destinationTexture, at: 1)

    commandEncoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)

    commandEncoder.endEncoding()

    autoreleasepool {
        var inPlaceTexture = destinationTexture
        blur.encode(commandBuffer: commandBuffer, inPlaceTexture: &inPlaceTexture, fallbackCopyAllocator: nil)
    }
}

但有时原地纹理往往会失败，最终会在屏幕上产生抖动效果。

因此，如果有人可以在不使用就地纹理或如何使用fallbackCopyAllocator或compute shaders以不同方式使用的情况下向我提出解决方案，那将非常有帮助。

score 2 · Accepted Answer

我在这方面做了足够多的编码（将计算着色器应用到来自摄像机的视频流），您遇到的最常见问题是“像素缓冲区重用”问题。

您从样本缓冲区创建的金属纹理会备份一个像素缓冲区，该缓冲区由视频会话管理，并且可以重新用于后续视频帧，除非您保留对样本缓冲区的引用（保留对金属质感不够）。

请随意查看我在https://github.com/snakajima/vs-metal上的代码，它将各种计算着色器应用于实时视频流。

VSContext:set() 方法除了纹理参数外还接受可选的 sampleBuffer 参数，并保留对 sampleBuffer 的引用，直到计算着色器的计算完成（在 VSRuntime:encode() 方法中）。

score 0 · Accepted Answer

就地操作方法可能会被命中或错过，具体取决于底层过滤器正在执行的操作。如果它是某些参数的单通过滤器，那么您最终会在这些情况下用完。

由于添加了该方法，MPS 添加了一个底层 MTLHeap 来为您更透明地管理内存。如果您的 MPSImage 不需要 CPU 查看并且在 GPU 上仅存在很短的时间，建议您只使用 MPSTemporaryImage 代替。当 readCount 达到 0 时，后备存储将通过 MPS 堆回收，并可供其他 MPSTemporaryImages 和下游使用的其他临时资源使用。同样，它的后备存储实际上不会从堆中分配，直到绝对必要（例如，写入纹理或调用 .texture）为每个命令缓冲区分配一个单独的堆。

使用临时图像应该有助于大大减少内存使用量。例如，在一个 Inception v3 神经网络图中，它有一百多个通道，堆能够自动将图减少到只有四个分配。

ios - 如何使用使用金属的自定义计算着色器并获得非常流畅的性能？

2 回答 2

Related

Reference