1

So, I am trying to perform some operation inside an OpenCL kernel. I have this buffer named filter which is a 3x3 matrix initialized with value 1.

I pass this as an argument to the OpenCL kernel from the host side. The issue is when I try to fetch this buffer on the device side as a float3 vector. For ex -

__kernel void(constant float3* restrict filter)
{
        float3 temp1 = filter[0];
        float3 temp2 = filter[1];
        float3 temp3 = filter[2];
}

The first two temp variables behave as expected and have all their value as 1. But, the third temp variable (temp3) has only the x component as 1 and rest of the y and z components are 0. When I fetch the buffer as only a float vector, everything behaves as expected. Am I doing something wrong? I don't want to use vload instructions as they give an overhead.

4

1 回答 1

2

在 OpenCL 中,float3只是 的别名float4,因此您的 9 个值将填充 and 的xyzw组件,temp1而. 您可能需要使用vload3指令。temp2temp3.x

请参阅第6.1.5 节。OpenCL 规范的类型对齐以获取更多信息:

对于 3 分量矢量数据类型,数据类型的大小为4 * sizeof(component). 这意味着 3 分量矢量数据类型将与4 * sizeof(component)边界对齐。vload3和vstore3内置函数可用于分别从打包的标量数据类型数组中读取和写入三分量向量数据类型。

于 2019-09-26T08:46:22.707 回答