0

I am looking for the fastest way to read numerical values stored in binary files.

I have done some functions that seem to work, but I'd like to get a feedback on whether or not my implementation is good.

Here is how I get a signed integer from a 4-bytes little endian block:

signed long int from4li(char const * const buffer)
{
    signed long int value = 0;

    value += (unsigned char) buffer[3];
    value <<= 8;
    value += (unsigned char) buffer[2];
    value <<= 8;
    value += (unsigned char) buffer[1];
    value <<= 8;
    value += (unsigned char) buffer[0];

    return value;
}

This would also work for unsigned integers, but I've originally made a different implementation for unsigned integers (that fails with signed integers, I don't know exactly why):

unsigned long int fromu4li(char const * const buffer)
{
    unsigned long int value = 0;

    value += (unsigned char) buffer[0] << 8 * 0;
    value += (unsigned char) buffer[1] << 8 * 1;
    value += (unsigned char) buffer[2] << 8 * 2;
    value += (unsigned char) buffer[3] << 8 * 3;

    return value;
}

I am more sure about the conversion from an integer to a little endian string buffer, which I think probably couldn't be optimized further:

void to4li(long int const value, char * const buffer)
{
    buffer[0] = value >> 8 * 0;
    buffer[1] = value >> 8 * 1;
    buffer[2] = value >> 8 * 2;
    buffer[3] = value >> 8 * 3;
}

I also think that it could be even faster usign memcpy, but to use memcpy I have to know the endianness of the host system.

I don't really want to rely on the endianness of the host system, as I think that my code should be independent of the internal data representation of the host system.

So, is this a proper way of doing those conversions, or can I improve my functions?

4

2 回答 2

0

使用按位 OR 似乎是个好主意,但有一些奇怪的地方:

经过测试,似乎c0, c1, c2,c3需要是无符号字符才能使此解决方案起作用。同样,我不知道为什么:

以 0x8080 为例,它是 -3264(有符号)或 32896(无符号)。

使用

char c0 = 0x80;
char c1 = 0x80;

我得到:

uint16_t res = (c0 << 0) | (c1 << 8);
// res = 65408 ???

uint16_t res = ((unsigned char) c0 << 0) | ((unsigned char) c1 << 8);
// res = 32896 ok
于 2012-06-16T15:04:10.547 回答
0

一种更简单的方法,一种避免由于对有符号整数变量进行位移而导致的未定义行为的方法,就是按位复制数据:

int32_t get(char const * const buf)
{
    int32_t result;
    char * const p = reinterpret_cast<char *>(&result);
    std::copy(buf, buf + sizeof result, p);
    return result;
}

此代码假定数据与机器具有相同的字节顺序。或者,您可以使用std::copy_backward反转字节顺序。

这种方法既依赖于流数据又依赖于主机字节序,因此它不如依赖于流数据的无符号整数的代数解决方案那么优雅。但是,由于有符号整数无论如何都依赖于平台,因此这应该是一个可以接受的折衷方案。

(只是为了比较,对于无符号整数,我更喜欢这个与机器无关的代码:

template <typename UInt>
typename std::enable_if<std::is_unsigned<UInt>::value, UInt>::type
get_from_le(unsigned char * const buf)
{
    UInt result;
    for (std::size_t i = 0; i != sizeof(UInt); ++i)
        result += (buf[i] << (8 * i));
    return result;
}

用法:auto ui = get_from_le<uint64_t>(buf);

对于 big-endian 版本,替换[i][sizeof(UInt) - i - 1].)

于 2012-06-16T21:10:19.127 回答