在 CPU 方面,我有一个要传递给计算内核的结构:
  private struct BoundingBoxParameters {
    var x: Float = 0
    var y: Float = 0
    var width: Float = 0
    var height: Float = 0
    var levelOfDetail: Float = 1.0
    var dummy: Float = 1.0  // Needed for success
  }
在运行内核之前,我将数据传递给 MTLComputeCommandEncoder:
选项 1(直接):
commandEncoder!.setBytes(¶ms, length: MemoryLayout<BoundingBoxParameters>.size, index: 0)
选项 2(间接通过 MTLBuffer):
boundingBoxBuffer.contents().copyBytes(from: ¶ms, count: MemoryLayout<BoundingBoxParameters>.size)
commandEncoder!.setBuffer(boundingBoxBuffer, offset: 0, index: 0)
如果结构中存在“dummy”变量,则任一选项都可以正常工作,但如果“dummy”变量不存在则失败。代码调用失败:
commandEncoder!.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount)
出现错误:
validateComputeFunctionArguments:820: failed assertion `Compute Function(resizeImage): argument params[0] from buffer(0) with offset(0) and length(20) has space for 20 bytes, but argument has a length(24).'
在金属内核方面,以下是相关的代码片段:
struct BoundingBoxParameters {
  float2 topLeft;
  float2 size;
  float levelOfDetail;
};
kernel void resizeImage(constant BoundingBoxParameters *params [[buffer(0)]],
                        texture2d<half, access::sample> sourceTexture [[texture(0)]],
                        texture2d<half, access::write> destTexture [[texture(1)]],
                        sampler samp [[sampler(0)]],
                        uint2 gridPosition [[thread_position_in_grid]]) {
  float2 destSize = float2(destTexture.get_width(0), destTexture.get_height(0));
  float2 sourceCoords = float2(gridPosition) / destSize;
  sourceCoords *= params->size;
  sourceCoords += params->topLeft;
  float lod = params->levelOfDetail;
  half4 color = sourceTexture.sample(samp, sourceCoords, level(lod));
  destTexture.write(color, gridPosition);
}
尝试将 3x3 矩阵传递给另一个计算内核时,我也遇到了类似的问题。它抱怨提供了 36 个字节,但预期为 48 个。
有人对这个问题有任何想法吗?