我在 2 个 16 位向量之间有以下乘法:
int16x8_t dx;
int16x8_t dy;
int16x8_t dxdy = vmulq_s16(dx, dy);
如果dx
和dy
都足够大,结果会溢出。
我想钳制 MIN_INT16 和 MAX_INT16 值之间的乘积;
如果不先将值转换为 int32,我还没有找到一种方法。这就是我现在所做的:
int32x4_t dx_low4 = vmovl_s16(simde_vget_low_s16(dx)); // get lower 4 elements and widen
int32x4_t dx_high4 = vmovl_high_s16(dx); // widen higher 4 elements
int32x4_t dy_low4 = vmovl_s16(simde_vget_low_s16(dy)); // get lower 4 elements and widen
int32x4_t dy_high4 = vmovl_high_s16(dy); // widen higher 4 elements
int32x4_t dxdy_low = vmulq_s32(dx_low4, dy_low4);
int32x4_t dxdy_high = vmulq_s32(dx_high4, dy_high4);
// combine and handle saturation:
int16x8_t dxdy = vcombine_s16(vqmovn_s32(dxdy_low), vqmovn_s32(dxdy_high));
有没有办法更有效地实现这一目标?