c - C 中的移位、类型和符号扩展

Question

我有以下代码：

unsigned char chr = 234; // 1110 1010
unsigned long result = 0;
result = chr << 24;

现在结果将等于 18446744073340452864，它是1111 1111 1111 1111 1111 1111 1111 1111 1110 1010 0000 0000 0000 0000 0000 0000二进制的。

当 chr 未签名时，为什么要进行符号扩展？

此外，如果我将移位从 24 更改为 8，则结果为 59904，它是0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 1010 0000 0000二进制的。为什么这里没有扩展？（任何 23 或更少的班次都没有对其进行符号扩展）

在我当前的平台上sizeof(long)也是8。

转换时自动转换为更大尺寸类型的规则是什么？在我看来，如果移位是 23 或小于 chr 被强制转换为无符号类型，如果它是 24 或更大，它被强制转换为有符号类型？（以及为什么符号扩展甚至完全通过左移完成）

score 3 · Accepted Answer

To understand this it's easiest to think in terms of values.

Each integral type has a fixed range of representable values. For example, unsigned char usually ranges from 0 to 255 ; other ranges are possible and you can find your compiler's choice by checking UCHAR_MAX in limits.h.

When doing a conversion between integral types; if the value is representable in the destination type, then the result of the conversion is that value. (This may be a different bit-pattern, e.g. sign extension).

If the value is not representable in the destination type then:

for signed destinations, the behaviour is implementation-defined (which may include raising a signal).
for unsigned destinations, the value is adjusted modulo the maximum value representable in the type, plus one.

Modern systems handle the signed out-of-range assignment by left-truncating excessive bits; and if it is still out-of-range then it retains the same bit-pattern, but the value changes to whatever value that bit-pattern represents in the destination type.

Moving onto your actual example.

In C, there is something called the integral promotions. With <<, this happens to the left-hand operand; with the arithmetic operators it happens to all operands. The effect of integral promotions is that any value of a type smaller than int is converted to the same value with type int.

Further, the definition of << 24 is multiplication by 2^24 (where this has the type of the promoted left operand), with undefined behaviour if this overflows. (Informally: shifting into the sign bit causes UB).

So, putting all the conversions explicitly, your code is

result = (unsigned long) ( ((int)chr) * 16777216 )

Now, the result of this calculation is 3925868544 , which if you are on a typical system with 32-bit int, is greater than INT_MAX which is 2147483647, so the behaviour is undefined.

If we want to explore results of this undefined behaviour on typical systems: what may happen is the same procedure I outlined earlier for out-of-range assignment. The bit-pattern of 3925868544 is of course 1110 1010 0000 0000 0000 0000 0000 0000. Treating this as the pattern of an int using 2's complement gives the int -369098752.

Finally we have the conversion of this value to unsigned long. -369098752 is out of range for unsigned long; and the rule for an unsigned destination is to adjust the value modulo ULONG_MAX+1. So the value you are seeing is 18446744073709551615 + 1 - 369098752.

If your intent was to do the calculation in unsigned long precision, you need to make one of the operands unsigned long; e.g. do ((unsigned long)chr) << 24. (Note: 24ul won't work, the type of the right-hand operand of << or >> does not affect the left-hand operand).

score 3 · Accepted Answer

使用chr = 234，表达式chr << 24被单独评估：chr提升为（32 位有符号）int并左移 24 位，产生一个负值int。当您分配给 64-bitunsigned long时，符号位将通过 64 位值的最高有效 32 位传播。请注意，计算方法chr << 24本身不受分配值的影响。

当移位仅为 8 位时，结果为正整数（带符号的 32 位），并且该符号位 (0) 通过unsigned long.

c - C 中的移位、类型和符号扩展

2 回答 2

Related

Reference