0

I'm reading and writing lots of FITS and DNG images which may contain data of an endianness different from my platform and/or opencl device.

Currently I swap the byte order in the host's memory if necessary which is very slow and requires an extra step.

Is there a fast way to pass a buffer of int/float/short having wrong endianess to an opencl-kernel?

Using an extra kernel run just for fixing the endianess would be ok; using some overheadless auto-fixing-read/-write operation would be perfect.

I know about the variable attribute ((endian(host/device))) but this doesn't help with a big endian FITS file on a little endian platform using a little endian device.

I thought about a solution like this one (neither implemented nor tested, yet):

uint4 mask = (uint4) (3, 2, 1, 0);
uchar4 swappedEndianness = shuffle(originalEndianness, mask);
// to be applied on a float/int-buffer somehow

Hoping there's a better solution out there.

Thanks in advance, runtimeterror

4

2 回答 2

2

当然。由于您有一个 uchar4 - 您可以简单地调整组件并将它们写回。

output[tid] = input[tid].wzyx;

swizzling 在 SIMD 架构上的性能也非常好,而且成本很低,因此您应该能够将它与内核中的其他操作结合起来。

希望这可以帮助!

于 2013-05-14T01:00:48.613 回答
1

大多数处理器架构在使用指令来完成适合其寄存器宽度的操作时表现最佳,例如 32/64 位宽度。当 CPU/GPU 执行此类逐字节运算符时,使用 的下标.wxyzuchar4它们需要使用掩码从整数中检索每个字节,移位字节,然后使用整数加法或或运算符到结果。对于字节顺序交换,处理器需要执行上述整数和移位、加/或 4 次,因为有 4 个字节。

最有效的方法如下

#define EndianSwap(n) (rotate(n & 0x00FF00FF, 24U)|(rotate(n, 8U) & 0x00FF00FF)

n可以是任何基因类型,例如,uint4变量。因为 OpenCL 不允许 C++ 类型重载,所以最好的选择是宏。

于 2014-02-06T14:32:02.013 回答