I'm reading and writing lots of FITS and DNG images which may contain data of an endianness different from my platform and/or opencl device.
Currently I swap the byte order in the host's memory if necessary which is very slow and requires an extra step.
Is there a fast way to pass a buffer of int/float/short having wrong endianess to an opencl-kernel?
Using an extra kernel run just for fixing the endianess would be ok; using some overheadless auto-fixing-read/-write operation would be perfect.
I know about the variable attribute ((endian(host/device))) but this doesn't help with a big endian FITS file on a little endian platform using a little endian device.
I thought about a solution like this one (neither implemented nor tested, yet):
uint4 mask = (uint4) (3, 2, 1, 0);
uchar4 swappedEndianness = shuffle(originalEndianness, mask);
// to be applied on a float/int-buffer somehow
Hoping there's a better solution out there.
Thanks in advance, runtimeterror