c - 这个 C 函数在运行时查找机器的字节序有什么问题？

Question

这是我今天接受采访时提出的。

int is_little_endian(void)
{
    union {
        long l;
        char c;
    } u;

    u.l = 1;

    return u.c == 1;
}

我的面试官坚持认为candl不能保证从同一个地址开始，因此应该将 union 更改为 saychar c[sizeof(long)]并且返回值应该更改为u.c[0] == 1.

工会成员可能不会从同一个地址开始，这是否正确？

score 8 · Accepted Answer

我不确定工会的成员，但SO 来救援。

检查可以更好地写成：

int is_bigendian(void) {
    const int i = 1;
    return (*(unsigned char*)&i) == 0;
}

顺便说一句，C FAQ 显示了这两种方法：如何确定机器的字节顺序是大端还是小端？

score 6 · Accepted Answer

您是正确的，因为“工会的成员可能从同一个地址开始”。该标准的相关部分是（6.7.2.1 第 13 段）：

工会的规模足以容纳其最大的成员。任何时候最多可以将其中一个成员的值存储在联合对象中。一个指向联合对象的指针，经过适当的转换，指向它的每个成员（或者如果一个成员是一个位域，那么指向它所在的单元），反之亦然。

基本上，保证联合的起始地址与其每个成员的起始地址相同。我相信（仍在寻找参考） along保证大于 a char。如果您假设这一点，那么您的解决方案应该*是有效的。

*由于围绕整数表示的一些有趣的措辞，特别是有符号整数类型，我仍然有点不确定。仔细阅读 6.2.6.2 第 1 和 2 条。

score 3 · Accepted Answer

虽然您的代码可能会在许多编译器中工作，但面试官是对的——如何对齐联合或结构中的字段完全取决于编译器，在这种情况下，字符可以放在“开始”或“结束” . 面试官的代码不容置疑，并且保证有效。

score 1 · Accepted Answer

该标准说联合中每个项目的偏移量是实现定义的。

当一个值存储在联合类型对象的成员中时，对象表示中不对应于该成员但对应于其他成员的字节采用未指定的值。 ISO/IEC 9899:1999 类型 6.5.6.2 的表示，第 7 段（pdf 文件）

因此，由编译器决定将 char 相对于联合体中的 long 放置在何处——不能保证它们具有相同的地址。

score 0 · Accepted Answer

我有一个关于这个的问题...

怎么

uc[0] == 任何东西

有效给定：

union {
    long l;
    char c;
} u;

[0] 如何在 char 上工作？

在我看来，它相当于： (*uc + 0) == 任何东西，这将是，好吧，废话，考虑到 uc 的值，被视为一个指针，将是废话。

（除非也许，正如我现在所想到的那样，一些 html 垃圾代码在原始问题中吃了一个＆符号......）

score 0 · Accepted Answer

虽然面试官是正确的，并且这不能保证符合规范，但其他答案都不能保证工作，因为在将指针转换为另一种类型后取消引用会产生未定义的行为。

在实践中，这个（和其他答案）将始终有效，因为所有编译器都允许透明地在指向联合的指针和指向联合成员的指针之间进行转换——如果不这样做，许多古老的代码将无法工作。

score 0 · Accepted Answer

如果我错了，请纠正我，但局部变量未初始化为 0；

这不是更好：

union {
    long l;
    char c;
} u={0,};

score 0 · Accepted Answer

尚未提及的一点是，该标准明确允许整数表示可能包含填充位的可能性。就我个人而言，我希望标准委员会允许程序以一种简单的方式来指定某些预期的行为，并要求任何编译器要么遵守这些规范，要么拒绝编译；以“整数不得具有填充位”规范开头的代码将有权假设是这种情况。

As it is, it would be perfectly legitimate (albeit odd) for an implementation to store 35-bit long values as four 9-bit characters in big-endian format, but use the LSB of the first byte as a parity bit. Under such an implementation, storing 1 into a long could cause the parity of the overall word to become odd, thus compelling the implementation to store a 1 into the parity bit.

To be sure, such behavior would be odd, but if architectures that use padding are sufficiently notable to justify explicit provisions in the standard, code which would break on such architectures can't really be considered truly "portable".

The code using union should work correctly on all architectures which can be simply described as "big-endian" or "little-endian" and do not use padding bits. It would be meaningless on some other architectures (and indeed the terms "big-endian" and "little-endian" could be meaningless too).

c - 这个 C 函数在运行时查找机器的字节序有什么问题？

8 回答 8

Related

Reference