我的问题的背景是网络编程。假设我想通过网络在两个程序之间发送消息。为简单起见,假设消息看起来像这样,字节顺序不是问题。我想找到一种正确、可移植且有效的方法来将这些消息定义为 C 结构。我知道有四种方法:显式强制转换、通过联合强制转换、复制和编组。
struct message {
uint16_t logical_id;
uint16_t command;
};
显式铸造:
void send_message(struct message *msg) {
uint8_t *bytes = (uint8_t *) msg;
/* call to write/send/sendto here */
}
void receive_message(uint8_t *bytes, size_t len) {
assert(len >= sizeof(struct message);
struct message *msg = (struct message*) bytes;
/* And now use the message */
if (msg->command == SELF_DESTRUCT)
/* ... */
}
我的理解是这send_message
不违反别名规则,因为字节/字符指针可以别名任何类型。但是,反之亦然,因此receive_message
违反了别名规则,因此具有未定义的行为。
通过联合铸造:
union message_u {
struct message m;
uint8_t bytes[sizeof(struct message)];
};
void receive_message_union(uint8_t *bytes, size_t len) {
assert(len >= sizeof(struct message);
union message_u *msgu = bytes;
/* And now use the message */
if (msgu->m.command == SELF_DESTRUCT)
/* ... */
}
然而,这似乎违反了工会在任何给定时间只包含其成员之一的想法。此外,如果源缓冲区未在字/半字边界上对齐,这似乎会导致对齐问题。
复制:
void receive_message_copy(uint8_t *bytes, size_t len) {
assert(len >= sizeof(struct message);
struct message msg;
memcpy(&msg, bytes, sizeof msg);
/* And now use the message */
if (msg.command == SELF_DESTRUCT)
/* ... */
}
这似乎可以保证产生正确的结果,但我当然更愿意不必复制数据。
编组
void send_message(struct message *msg) {
uint8_t bytes[4];
bytes[0] = msg.logical_id >> 8;
bytes[1] = msg.logical_id & 0xff;
bytes[2] = msg.command >> 8;
bytes[3] = msg.command & 0xff;
/* call to write/send/sendto here */
}
void receive_message_marshal(uint8_t *bytes, size_t len) {
/* No longer relying on the size of the struct being meaningful */
assert(len >= 4);
struct message msg;
msg.logical_id = (bytes[0] << 8) | bytes[1]; /* Big-endian */
msg.command = (bytes[2] << 8) | bytes[3];
/* And now use the message */
if (msg.command == SELF_DESTRUCT)
/* ... */
}
仍然必须复制,但现在与结构的表示分离。但是现在我们需要明确每个成员的位置和大小,字节序是一个更明显的问题。
相关资料:
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
现实世界的例子
我一直在寻找网络代码的示例,以了解如何在其他地方处理这种情况。轻量级的ip也有几个类似的案例。udp.c文件中包含以下代码:
/**
* Process an incoming UDP datagram.
*
* Given an incoming UDP datagram (as a chain of pbufs) this function
* finds a corresponding UDP PCB and hands over the pbuf to the pcbs
* recv function. If no pcb is found or the datagram is incorrect, the
* pbuf is freed.
*
* @param p pbuf to be demultiplexed to a UDP PCB (p->payload pointing to the UDP header)
* @param inp network interface on which the datagram was received.
*
*/
void
udp_input(struct pbuf *p, struct netif *inp)
{
struct udp_hdr *udphdr;
/* ... */
udphdr = (struct udp_hdr *)p->payload;
/* ... */
}
其中struct udp_hdr
是 udp 标头的打包表示,p->payload
类型为void *
. 根据我的理解和这个答案,这绝对是[edit-not] 破坏严格混叠,因此具有未定义的行为。