optimization - OpenCL 小于等于和布尔向量

Question

我有一种情况，我通过以下方式解决了：

//cube_potentials is float8
//level_vec is float8
//shift_vec is int8 and contains (non-overlapping) bit shifts
int8 shifts = (cube_potentials<=level_vec);
int flag_index = 0;\n"
if (shifts.s0) flag_index |= shift_vec.s0;
if (shifts.s1) flag_index |= shift_vec.s1;
if (shifts.s2) flag_index |= shift_vec.s2;
if (shifts.s3) flag_index |= shift_vec.s3;
if (shifts.s4) flag_index |= shift_vec.s4;
if (shifts.s5) flag_index |= shift_vec.s5;
if (shifts.s6) flag_index |= shift_vec.s6;
if (shifts.s7) flag_index |= shift_vec.s7;

有用。问题是所有那些 if 语句让我厌烦，我也无法想象它们是世界上最快的东西。我想像这样解决它：

//Method 1
bool8 less = (bool8)(cube_potentials<=level_vec);
int8 shifts = (int8)(less) * shift_vec;
int flag_index = shifts.s0 | shifts.s1 | shifts.s2 | shifts.s3 | shifts.s4 | shifts.s5 | shifts.s6 | shifts.s7;

//Method 2 (more simply)
int8 shifts = ((int8)(cube_potentials<=level_vec)) * shift_vec;
int flag_index = shifts.s0 | shifts.s1 | shifts.s2 | shifts.s3 | shifts.s4 | shifts.s5 | shifts.s6 | shifts.s7;

问题是 bool8 是保留类型，不是真正的类型，所以方法 1 不可用。但是，方法 2 不能正常工作。我怀疑原因与它的第一行有关。<= 在两个浮点向量上，我不知道它返回什么，但大概当它被转换为 int8 时，它不全是 0 和 1。

我的问题是，是否有任何方法可以以更清晰、更并行的方式重写原始代码？

谢谢，

score 3 · Accepted Answer

试试这个。它可能有效：

// gives -1 (0xFFFFFFFF) or 0 for when T or F for each comparison:
int8 shifts = cube_potentials <= level_vec;

// leaves only the elements that passed the above compare:
shift_vec &= shifts;

// start combining (with all 8 elements):
shift_vec.lo |= shift_vec.hi;

// keep going (down to the bottom 4):
shift_vec.lo.lo |= shift_vec.lo.hi;

// last one (only considering the bottom two):
int flag_index = shift_vec.lo.lo.lo |= shift_vec.lo.lo.hi;

score 0 · Accepted Answer

编辑：好的，第二次尝试：

flag_index = dot(shift_vecs, -islessequal(cube_potentials, level_vec));

不过，我想要一个很好的评论。

islessequal()应该返回 -1 或 0 来表示 true 和 false。
我们取反得到 1 或 0
然后我们使用点积对 shift_vecs 中返回 true 的组件求和。

笔记：

dot() 通常是硬件指令，所以应该很快。
islessequal()可以替换为<=
仅当shift_vec由于使用求和而导致位值不重叠（您说它们重叠）时才有效。

optimization - OpenCL 小于等于和布尔向量

2 回答 2

Related

Reference