在 OpenCL 中,当我们将输入参数指定为 const global * restrict 时,我们得到了一个有效的硬件路径(对于一段手写的 OpenCL 代码):
__kernel void oclConvolveGlobalMem(const global float* restrict input,
constant float* restrict filterWeights,
global float* restrict output)
然而,正如HL_DEBUG_CODEGEN=1
卤化物所见,生成:
// Address spaces for kernel_conv_70_s0_y___block_id_y
#define __address_space__conv__70 __global
#define __address_space__input __global
#define __address_space__kernel __global
__kernel void kernel_conv_70_s0_y___block_id_y(
const int _conv__70_extent_0,
const int _conv__70_extent_1,
const int _conv__70_min_0,
const int _conv__70_min_1,
const int _conv__70_stride_1,
const int _input_min_0,
const int _input_min_1,
const int _input_stride_1,
const int _kernel_min_0,
const int _kernel_min_1,
const int _kernel_stride_1,
__address_space__conv__70 float *_conv__70,
__address_space__input const float *_input,
__address_space__kernel const float *_kernel,
__address_space___shared int16* __shared)
其中输入参数未声明restrict
。我希望这会真诚地限制性能。我确实让 Halide 添加了指针受到限制的概念(它们使用的缓冲区没有别名。)