macos - 您应该如何在 Metal 中正确编码大量 blit 或缩放命令？

Question

在我正在处理的使用传统 Metal 渲染循环的应用程序中，我需要定期将大量图像数据从IOSurfacesto复制到 to，MTLTextures以便可以在片段着色器中访问数据。我正在努力学习最有效的方法来做到这一点。

每个都IOSurface代表一个可能非常大的图像中的一个图块。（就像拼接的全景图。）Core Image 用于将图像数据渲染到每个IOSurface.

在 Metal中，如果图像“缩小”小于视图大小，我有一个包含足够切片来覆盖视口和/或图像本身的MTLTexture类型。2DArray

TheIOSurface和MTLTextureeach 是二次方的维度，但有时它们可能是不同的维度。当它们的尺寸相同时，我使用 anMTLBlitCommandEncoder但当它们的尺寸不同时，我使用MPSImageScale。

如果我需要将大量 IOSurface 复制到大量金属纹理，我应该一次一个、分批还是一次全部复制？

尝试#1：一次性

这种方法有效，但如果可见表面的数量变得非常大，就会开始崩溃。你最终会在提交之前预先分配一堆表面支持的纹理。这种方法对我来说似乎是最合乎逻辑的，但它也会在 Xcode 的 GPU 洞察中引起最多的警告，并在不需要时使用最多的纹理内存。

伪代码如下：

func renderAllAtOnce() { 

  // Create one command buffer. 
  let commandBuffer = commandQueue.makeCommandBuffer()
  let blitEncoder = commandBuffer.makeBlitCommandEncoder()

  // Encode a copy for each surface.
  for surface in visibleSurfaces { 

    // Make a texture from the surface.
    let surfaceTexture = makeTextureFromSurface(surface)

    // Copy from the surface-backed texture into the appropriate slice in the destination texture.
    bitEncoder.copy(surfaceTexture, to: destinationTexture, slice:...)
  }

  // Commit the encoder.
  blitEncoder.endEncoding()
  commandBuffer.commit()
  commandBuffer.waitUntilCompleted()

  // Bind textures and issue draw calls using a render encoder.
  renderEncoder.draw(...)
}

尝试2：分批

在这个实现中，我将复制复制命令任意分组为 10 个组。这意味着我只在提交缓冲区之前预先分配了多达 10 个表面支持的“sourceTextures”。这似乎使 GPU 更快乐一点，但 10 的值似乎相当随意。这里有一个可以根据硬件确定的最佳数字吗？

func renderInBatches() { 

  // Arbitrarily group surfaces into groups of 10.
  for group in visibleSurfaces(groupsOf: 10) {

    // Create a new command buffer and encoder for each group.
    let commandBuffer = commandQueue.makeCommandBuffer()
    let blitEncoder = commandBuffer.makeBlitCommandEncoder()

    // Encode only up to 10 copy commands.
    for surface in group { 
      let surfaceTexture = makeTextureFromSurface()
      bitEncoder.copy(surfaceTexture, to: destinationTexture, slice:...)
    }
  
    blitEncoder.endEncoding()
    commandBuffer.commit()
    commandBuffer.waitUntilCompleted()
  }

  // Bind textures and issue draw calls using a render encoder.
}

尝试3：一次一个

没有代码，但这个选项只是使用上面的批处理选项，但以 1 为一组。实际上，为需要复制到纹理的每个表面创建一个新的命令缓冲区和 blit 编码。最初这似乎非常浪费，但现在我意识到命令缓冲区和编码器非常轻量级。毕竟，无论如何，您都会在每个渲染通道上创建新的。

但是一次做一次是否没有充分利用 GPU？复制操作之间没有依赖关系。

TL;博士

如果您必须使用发出大量blit复制命令或缩放命令MPS，那么最有效和“正确”的方法是什么？

目前，我正在针对 macOS 11.0 及更高版本进行构建。该应用程序应在任何受支持的硬件上运行。

score 0 · Accepted Answer

您绝对应该在命令缓冲区和编码器中投入尽可能多的工作。

在这种情况下，您可以拥有一个命令缓冲区，首先使用图像过滤器填充该缓冲区，然后在单个 blit 命令编码器中执行所有 blit。

另一方面，您还可以创建一个MTLTexturefrom IOSurface，因此如果它们具有相同的尺寸，您就不必进行 blit。

https://developer.apple.com/documentation/metal/mtldevice/1433378-newtexturewithdescriptor?language=objc

macos - 您应该如何在 Metal 中正确编码大量 blit 或缩放命令？

1 回答 1

Related

Reference