c++ - C/C++ 将带符号的 char 打包成 int

Question

我需要将四个有符号字节打包成 32 位整数类型。这就是我想出的：

int32_t byte(int8_t c) { return (unsigned char)c; }

int pack(char c0, char c1, ...) {
  return byte(c0) | byte(c1) << 8 | ...;
}

这是一个好的解决方案吗？它是便携的（不是在通信意义上）吗？有没有现成的解决方案，也许是提升？

我最关心的问题是将负位从 char 转换为 int 时的位顺序。我不知道正确的行为应该是什么。

谢谢

score 8 · Accepted Answer

char isn't guaranteed to be signed or unsigned (on PowerPC Linux, char defaults to unsigned). Spread the word!

What you want is something like this macro:

#include <stdint.h> /* Needed for uint32_t and uint8_t */

#define PACK(c0, c1, c2, c3) \
    (((uint32_t)(uint8_t)(c0) << 24) | \
    ((uint32_t)(uint8_t)(c1) << 16) | \
    ((uint32_t)(uint8_t)(c2) << 8) | \
    ((uint32_t)(uint8_t)(c3)))

It's ugly mainly because it doesn't play well with C's order of operations. Also, the backslash-returns are there so this macro doesn't have to be one big long line.

Also, the reason we cast to uint8_t before casting to uint32_t is to prevent unwanted sign extension.

score 7 · Accepted Answer

I liked Joey Adam's answer except for the fact that it is written with macros (which cause a real pain in many situations) and the compiler will not give you a warning if 'char' isn't 1 byte wide. This is my solution (based off Joey's).

inline uint32_t PACK(uint8_t c0, uint8_t c1, uint8_t c2, uint8_t c3) {
    return (c0 << 24) | (c1 << 16) | (c2 << 8) | c3;
}

inline uint32_t PACK(sint8_t c0, sint8_t c1, sint8_t c2, sint8_t c3) {
    return PACK((uint8_t)c0, (uint8_t)c1, (uint8_t)c2, (uint8_t)c3);
}

I've omitted casting c0->c3 to a uint32_t as the compiler should handle this for you when shifting and I used c-style casts as they will work for either c or c++ (the OP tagged as both).

score 4 · Accepted Answer

您可以避免使用隐式转换进行强制转换：

uint32_t pack_helper(uint32_t c0, uint32_t c1, uint32_t c2, uint32_t c3) {
    return c0 | (c1 << 8) | (c2 << 16) | (c3 << 24);
}

uint32_t pack(uint8_t c0, uint8_t c1, uint8_t c2, uint8_t c3) {
    return pack_helper(c0, c1, c2, c3);
}

这个想法是你看到“正确转换所有参数。移位并组合它们”，而不是“对于每个参数，正确转换它，移位并组合它”。不过，里面的内容不多。

然后：

template <int N>
uint8_t unpack_u(uint32_t packed) {
    // cast to avoid potential warnings for implicit narrowing conversion
    return static_cast<uint8_t>(packed >> (N*8));
}

template <int N>
int8_t unpack_s(uint32_t packed) {
    uint8_t r = unpack_u<N>(packed);
    return (r <= 127 ? r : r - 256); // thanks to caf
}

int main() {
    uint32_t x = pack(4,5,6,-7);
    std::cout << (int)unpack_u<0>(x) << "\n";
    std::cout << (int)unpack_s<1>(x) << "\n";
    std::cout << (int)unpack_u<3>(x) << "\n";
    std::cout << (int)unpack_s<3>(x) << "\n";
}

输出：

uint32_t这与,uint8_t和int8_t类型一样可移植。C99 中不需要它们，并且头文件 stdint.h 未在 C++ 或 C89 中定义。但是，如果类型存在并且满足 C99 要求，则代码将起作用。当然，在 C 中，解包函数需要函数参数而不是模板参数。如果您想编写用于解包的短循环，您可能也更喜欢在 C++ 中使用它。

为了解决类型是可选的这一事实，您可以使用uint_least32_tC99 中必需的。同样uint_least8_t和int_least8_t。您将不得不更改 pack_helper 和 unpack_u 的代码：

uint_least32_t mask(uint_least32_t x) { return x & 0xFF; }

uint_least32_t pack_helper(uint_least32_t c0, uint_least32_t c1, uint_least32_t c2, uint_least32_t c3) {
    return mask(c0) | (mask(c1) << 8) | (mask(c2) << 16) | (mask(c3) << 24);
}

template <int N>
uint_least8_t unpack_u(uint_least32_t packed) {
    // cast to avoid potential warnings for implicit narrowing conversion
    return static_cast<uint_least8_t>(mask(packed >> (N*8)));
}

老实说，这不太可能值得 - 您的应用程序的其余部分可能是在假设int8_tetc 确实存在的情况下编写的。这是一种没有 8 位和 32 位 2 的补码类型的罕见实现。

score 1 · Accepted Answer

"Goodness"
IMHO, this is the best solution you're going to get for this. EDIT: though I'd use static_cast<unsigned int> instead of the C-style cast, and I'd probably not use a separate method to hide the cast....

Portability:
There is going to be no portable way to do this because nothing says char has to be eight bits, and nothing says unsigned int needs to be 4 bytes wide.

Furthermore, you're relying on endianness and therefore data pack'd on one architecture will not be usable on one with the opposite endianness.

is there a ready-made solution, perhaps boost?
Not of which I am aware.

score 1 · Accepted Answer

这是基于 Grant Peters 和 Joey Adams 的回答，扩展以展示如何解包有符号值（解包函数依赖于 C 中无符号值的模规则）：

（正如史蒂夫杰索普在评论中指出的那样，不需要单独的pack_s和pack_u功能）。

inline uint32_t pack(uint8_t c0, uint8_t c1, uint8_t c2, uint8_t c3)
{
    return ((uint32_t)c0 << 24) | ((uint32_t)c1 << 16) |
        ((uint32_t)c2 << 8) | (uint32_t)c3;
}

inline uint8_t unpack_c3_u(uint32_t p)
{
    return p >> 24;
}

inline uint8_t unpack_c2_u(uint32_t p)
{
    return p >> 16;
}

inline uint8_t unpack_c1_u(uint32_t p)
{
    return p >> 8;
}

inline uint8_t unpack_c0_u(uint32_t p)
{
    return p;
}

inline uint8_t unpack_c3_s(uint32_t p)
{
    int t = unpack_c3_u(p);
    return t <= 127 ? t : t - 256;
}

inline uint8_t unpack_c2_s(uint32_t p)
{
    int t = unpack_c2_u(p);
    return t <= 127 ? t : t - 256;
}

inline uint8_t unpack_c1_s(uint32_t p)
{
    int t = unpack_c1_u(p);
    return t <= 127 ? t : t - 256;
}

inline uint8_t unpack_c0_s(uint32_t p)
{
    int t = unpack_c0_u(p);
    return t <= 127 ? t : t - 256;
}

（这些是必要的，而不是简单地转换回int8_t，因为如果值超过 127，后者可能会引发实现定义的信号，因此它不是严格可移植的）。

score -2 · Accepted Answer

您也可以让编译器为您完成工作。

union packedchars {
  struct {
    char v1,v2,v3,v4;
  }
  int data;
};

packedchars value;
value.data = 0;
value.v1 = 'a';
value.v2 = 'b;

等等。

c++ - C/C++ 将带符号的 char 打包成 int

6 回答 6

Related

Reference