c++ - 正确处理字节对齐问题——在 16 位嵌入式系统和 32 位桌面之间通过 UDP

Question

我正在处理的应用程序从嵌入系统接收 C 风格的结构，该系统的代码是针对 16 位处理器生成的。与嵌入式系统对话的应用程序是使用 32 位 gcc 编译器或 32 位 MSVC c++ 编译器构建的。应用程序和嵌入式系统之间的通信通过以太网或调制解调器上的 UDP 数据包进行。

UDP 数据包中的有效负载由各种不同的 C 风格结构组成。在应用程序端，C++ 风格reinterpret_cast能够获取无符号字节数组并将其转换为适当的结构。

reinterpret_cast但是，当结构包含枚举值时，我遇到了问题。16 位 Watcom 编译器会将枚举值视为 uint8_t 类型。但是，在应用程序端，枚举值被视为 32 位值。当我收到一个包含枚举值的数据包时，数据会出现乱码，因为应用程序端结构的大小大于嵌入式端的结构。

到目前为止，解决这个问题的方法是将应用程序端的 struct 中的枚举类型更改为 uint8_t。但是，这不是最佳解决方案，因为我们不能再将成员用作枚举类型。

我正在寻找的是一种解决方案，它允许我使用简单的强制转换操作，而不必篡改应用程序端源中的结构定义。通过这样做，我可以在应用程序的上层中使用该结构。

score 2 · Accepted Answer

如前所述，正确处理问题是正确的序列化和反序列化。

但这并不意味着我们不能尝试一些技巧。

选项 1：如果您的特定编译器支持打包枚举（在我的情况下，Windows 中的 gcc 4.7），这可能有效：

typedef enum { VALUE_1 = 1, VALUE_2, VALUE_3 }__attribute__ ((__packed__)) TheRealEnum;

选项 2：

如果您的特定编译器支持 < 4 字节的类大小，则可以使用使用运算符重载进行转换的HackedEnum类（注意您可能不想要的 gcc属性）：

class HackedEnum
{
private:
    uint8_t evalue;
public:
    void operator=(const TheRealEnum v) { evalue = v; };
    operator TheRealEnum() { return (TheRealEnum)evalue; };
}__attribute__((packed));

您可以将结构中的 TheRealEnum 替换为HackedEnum ，但仍继续将其用作 TheRealEnum。

一个完整的例子来看看它的工作：

#include <iostream>
#include <stddef.h>

using namespace std;

#pragma pack(push, 1)

typedef enum { VALUE_1 = 1, VALUE_2, VALUE_3 } TheRealEnum;

typedef struct
{
    uint16_t v1;
    uint8_t enumValue;
    uint16_t v2;
}__attribute__((packed)) ShortStruct;

typedef struct
{
    uint16_t v1;
    TheRealEnum enumValue;
    uint16_t v2;
}__attribute__((packed)) LongStruct;

class HackedEnum
{
private:
    uint8_t evalue;
public:
    void operator=(const TheRealEnum v) { evalue = v; };
    operator TheRealEnum() { return (TheRealEnum)evalue; };
}__attribute__((packed));

typedef struct
{
    uint16_t v1;
    HackedEnum enumValue;
    uint16_t v2;
}__attribute__((packed)) HackedStruct;

#pragma pop()

int main(int argc, char **argv)
{
    cout << "Sizes: " << endl
         << "TheRealEnum: " << sizeof(TheRealEnum) << endl
         << "ShortStruct: " << sizeof(ShortStruct) << endl
         << "LongStruct: " << sizeof(LongStruct) << endl
         << "HackedStruct: " << sizeof(HackedStruct) << endl;

    ShortStruct ss;
    cout << "address of ss: " << &ss <<  " size " << sizeof(ss) <<endl
         << "address of ss.v1: " << (void*)&ss.v1 << endl
         << "address of ss.ev: " << (void*)&ss.enumValue << endl
         << "address of ss.v2: " << (void*)&ss.v2 << endl;

    LongStruct ls;
    cout << "address of ls: " << &ls <<  " size " << sizeof(ls) <<endl
         << "address of ls.v1: " << (void*)&ls.v1 << endl
         << "address of ls.ev: " << (void*)&ls.enumValue << endl
         << "address of ls.v2: " << (void*)&ls.v2 << endl;

    HackedStruct hs;
    cout << "address of hs: " << &hs <<  " size " << sizeof(hs) <<endl
         << "address of hs.v1: " << (void*)&hs.v1 << endl
         << "address of hs.ev: " << (void*)&hs.enumValue << endl
         << "address of hs.v2: " << (void*)&hs.v2 << endl;


    uint8_t buffer[512] = {0};

    ShortStruct * short_ptr = (ShortStruct*)buffer;
    LongStruct * long_ptr = (LongStruct*)buffer;
    HackedStruct * hacked_ptr = (HackedStruct*)buffer;

    short_ptr->v1 = 1;
    short_ptr->enumValue = VALUE_2;
    short_ptr->v2 = 3;

    cout << "Values of short: " << endl
            << "v1 = " << short_ptr->v1 << endl
            << "ev = " << (int)short_ptr->enumValue << endl
            << "v2 = " << short_ptr->v2 << endl;

    cout << "Values of long: " << endl
            << "v1 = " << long_ptr->v1 << endl
            << "ev = " << long_ptr->enumValue << endl
            << "v2 = " << long_ptr->v2 << endl;

    cout << "Values of hacked: " << endl
            << "v1 = " << hacked_ptr->v1 << endl
            << "ev = " << hacked_ptr->enumValue << endl
            << "v2 = " << hacked_ptr->v2 << endl;



    HackedStruct hs1, hs2;

    // hs1.enumValue = 1; // error, the value is not the wanted enum

    hs1.enumValue = VALUE_1;
    int a = hs1.enumValue;
    TheRealEnum b = hs1.enumValue;
    hs2.enumValue = hs1.enumValue;

    return 0;
}

我的特定系统上的输出是：

Sizes:
TheRealEnum: 4
ShortStruct: 5
LongStruct: 8
HackedStruct: 5
address of ss: 0x22ff17 size 5
address of ss.v1: 0x22ff17
address of ss.ev: 0x22ff19
address of ss.v2: 0x22ff1a
address of ls: 0x22ff0f size 8
address of ls.v1: 0x22ff0f
address of ls.ev: 0x22ff11
address of ls.v2: 0x22ff15
address of hs: 0x22ff0a size 5
address of hs.v1: 0x22ff0a
address of hs.ev: 0x22ff0c
address of hs.v2: 0x22ff0d
Values of short:
v1 = 1
ev = 2
v2 = 3
Values of long:
v1 = 1
ev = 770
v2 = 0
Values of hacked:
v1 = 1
ev = 2
v2 = 3

score 1 · Accepted Answer

在应用程序端，C++ 风格的 reinterpret_cast 能够获取无符号字节数组并将其转换为适当的结构。

不同实现之间的结构布局不需要相同。以这种方式使用 reinterpret_cast 是不合适的。

16 位 Watcom 编译器会将枚举值视为 uint8_t 类型。但是，在应用程序端，枚举值被视为 32 位值。

枚举的底层类型由实现选择，并以实现定义的方式选择。

这只是实现之间可能导致 reinterpret_cast 出现问题的众多潜在差异之一。如果您不小心，也存在实际的对齐问题，其中接收缓冲区中的数据未针对类型进行适当对齐（例如，需要四字节对齐的整数以一个字节结束），这可能导致崩溃或不良表现。不同平台之间的填充可能不同，基本类型的大小可能不同，字节序可能不同，等等。

我正在寻找的是一种解决方案，它允许我使用简单的强制转换操作，而不必篡改应用程序端源中的结构定义。通过这样做，我可以在应用程序的上层中使用该结构。

C++11 引入了一种新的枚举语法，允许您指定基础类型。或者，您可以将枚举替换为整数类型以及一堆带有手动声明值的预定义常量。这只会解决您所询问的问题，而不是您遇到的任何其他问题。

你真正应该做的是正确的序列化和反序列化。

score 0 · Accepted Answer

将您的枚举类型放在具有 32 位数字的联合中：

union
{
  Enumerated val;
  uint32_t valAsUint32;
};

这将使嵌入式端扩展为 32 位。只要两个平台都是 little-endian 并且结构最初是零填充的，就应该可以工作。不过，这会改变电线格式。

score 0 · Accepted Answer

如果“简单转换操作”是指在源代码中表达的东西，而不是必须是零拷贝的东西，那么你可以编写结构的两个版本——一个带有枚举，一个带有 uint8_ts，一个带有构造函数从另一个逐个元素地复制它以重新包装它。然后，您可以在其余代码中使用普通类型转换。由于数据大小根本不同（除非您使用另一个答案中提到的 C++11 功能），因此如果不复制内容以重新打包它们，您就无法做到这一点。

但是，如果您不介意在应用程序端对结构定义进行一些小的更改，那么有几个选项不涉及处理裸 uint8_t 值。您可以使用 aaronps 对 uint8_t 大小的类的答案（假设您的编译器可以这样做）并隐式转换为枚举。或者，您可以将值存储为 uint8_ts 并为您的枚举值编写一些访问器方法，这些访问器方法获取结构中的 uint8_t 数据并在返回之前将其转换为枚举。

c++ - 正确处理字节对齐问题——在 16 位嵌入式系统和 32 位桌面之间通过 UDP

4 回答 4

Related

Reference