c++ - 如何从 8 位整数中获得大于 8 位的值？

Question

我找到了隐藏在这个小宝石后面的一个非常讨厌的错误。我知道，根据 C++ 规范，有符号溢出是未定义的行为，但只有当值扩展到 bit-width 时才会发生溢出sizeof(int)。据我了解，char只要sizeof(char) < sizeof(int). 但这并不能解释如何c获得不可能的价值。作为一个 8 位整数，如何c保存大于其位宽的值？

代码

// Compiled with gcc-4.7.2
#include <cstdio>
#include <stdint.h>
#include <climits>

int main()
{
   int8_t c = 0;
   printf("SCHAR_MIN: %i\n", SCHAR_MIN);
   printf("SCHAR_MAX: %i\n", SCHAR_MAX);

   for (int32_t i = 0; i <= 300; i++)
      printf("c: %i\n", c--);

   printf("c: %i\n", c);

   return 0;
}

输出

SCHAR_MIN: -128
SCHAR_MAX: 127
c: 0
c: -1
c: -2
c: -3
...
c: -127
c: -128  // <= The next value should still be an 8-bit value.
c: -129  // <= What? That's more than 8 bits!
c: -130  // <= Uh...
c: -131
...
c: -297
c: -298  // <= Getting ridiculous now.
c: -299
c: -300
c: -45   // <= ..........

在ideone上查看。

score 111 · Accepted Answer

这是一个编译器错误。

尽管为未定义的行为获得不可能的结果是一个有效的结果，但实际上您的代码中没有未定义的行为。发生的事情是编译器认为行为未定义，并相应地进行优化。

如果c定义为int8_t，并int8_t提升为int，则应该在算术c--中执行减法并将结果转换回。减法不会溢出，将超出范围的整数值转换为另一种整数类型是有效的。如果目标类型是有符号的，则结果是实现定义的，但它必须是目标类型的有效值。（如果目标类型是无符号的，则结果是明确定义的，但这不适用于此处。）c - 1intint8_tint

score 15 · Accepted Answer

A compiler can have bugs which are other than nonconformances to the standard, because there are other requirements. A compiler should be compatible with other versions of itself. It may also be expected to be compatible in some ways with other compilers, and also to conform to some beliefs about behavior that are held by the majority of its user base.

In this case, it appears to be a conformance bug. The expression c-- should manipulate c in a way similar to c = c - 1. Here, the value of c on the right is promoted to type int, and then the subtraction takes place. Since c is in the range of int8_t, this subtraction will not overflow, but it may produce a value which is out of the range of int8_t. When this value is assigned, a conversion takes place back to the type int8_t so the result fits back into c. In the out-of-range case, the conversion has an implementation-defined value. But a value out of the range of int8_t is not a valid implementation-defined value. An implementation cannot "define" that an 8 bit type suddenly holds 9 or more bits. For the value to be implementation-defined means that something in the range of int8_t is produced, and the program continues. The C standard thereby allows for behaviors such as saturation arithmetic (common on DSP's) or wrap-around (mainstream architectures).

The compiler is using a wider underlying machine type when manipulating values of small integer types like int8_t or char. When arithmetic is performed, results which are out of range of the small integer type can be captured reliably in this wider type. To preserve the externally visible behavior that the variable is an 8 bit type, the wider result has to be truncated into the 8 bit range. Explicit code is required to do that since the machine storage locations (registers) are wider than 8 bits and happy with the larger values. Here, the compiler neglected to normalize the value and simply passed it to printf as is. The conversion specifier %i in printf has no idea that the argument originally came from int8_t calculations; it is just working with an int argument.

score 14 · Accepted Answer

我无法将其放入评论中，因此我将其发布为答案。

由于一些非常奇怪的原因，--操作员恰好是罪魁祸首。

我测试了发布在 Ideone 上的代码并替换c--为c = c - 1，值保持在 [-128 ... 127] 范围内：

c: -123
c: -124
c: -125
c: -126
c: -127
c: -128 // about to overflow
c: 127  // woop
c: 126
c: 125
c: 124
c: 123
c: 122

怪眼？我不太了解编译器对i++or之类的表达式做了什么i--。它可能会将返回值提升为 anint并传递它。这是我能得出的唯一合乎逻辑的结论，因为您实际上得到的值无法放入 8 位。

score 12 · Accepted Answer

我猜底层硬件仍在使用 32 位寄存器来保存该 int8_t。由于规范没有强加溢出行为，因此实现不检查溢出并允许存储更大的值。

如果将局部变量标记为volatile强制为其使用内存并因此获得范围内的预期值。

score 11 · Accepted Answer

汇编代码揭示了问题：

:loop
mov esi, ebx
xor eax, eax
mov edi, OFFSET FLAT:.LC2   ;"c: %i\n"
sub ebx, 1
call    printf
cmp ebx, -301
jne loop

mov esi, -45
mov edi, OFFSET FLAT:.LC2   ;"c: %i\n"
xor eax, eax
call    printf

EBX 应与 FF 后减量进行与运算，或者仅应使用 BL 并清除 EBX 的其余部分。好奇它使用 sub 而不是 dec。-45 非常神秘。这是 300 和 255 = 44 的按位反转。-45 = ~44。某处有联系。

它使用 c = c - 1 完成了更多的工作：

mov eax, ebx
mov edi, OFFSET FLAT:.LC2   ;"c: %i\n"
add ebx, 1
not eax
movsx   ebp, al                 ;uses only the lower 8 bits
xor eax, eax
mov esi, ebp

然后它只使用 RAX 的低部分，因此它被限制为 -128 到 127。编译器选项“-g -O2”。

没有优化，它会产生正确的代码：

movzx   eax, BYTE PTR [rbp-1]
sub eax, 1
mov BYTE PTR [rbp-1], al
movsx   edx, BYTE PTR [rbp-1]
mov eax, OFFSET FLAT:.LC2   ;"c: %i\n"
mov esi, edx

所以这是优化器中的一个错误。

score 4 · Accepted Answer

使用%hhd代替%i！应该能解决你的问题。

你看到的是编译器优化的结果，你告诉 printf 打印一个 32 位数字，然后将一个（假设是 8 位）数字压入堆栈，这实际上是指针大小的，因为这就是 x86 中的推送操作码的工作方式。

score 3 · Accepted Answer

我认为这是通过优化代码来实现的：

for (int32_t i = 0; i <= 300; i++)
      printf("c: %i\n", c--);

编译器使用int32_t i变量 fori和c。关闭优化或进行直接投射 printf("c: %i\n", (int8_t)c--);

score 1 · Accepted Answer

c本身定义为int8_t，但是当操作++或--超过int8_t它时，它首先被隐式转换为int和操作的结果，而不是c 的内部值用 printf 打印，恰好是int。

查看整个循环之后的实际值，c尤其是在最后一次递减之后

-301 + 256 = -45 (since it revolved entire 8 bit range once)

它是类似于行为的正确值-128 + 1 = 127

c开始使用int大小内存，但打印int8_t时仅使用8 bits. 32 bits用作时使用所有int

[编译器错误]

score 0 · Accepted Answer

我认为这是因为你的循环会一直持续到 int i 变为 300 并且 c 变为 -300。最后一个值是因为

printf("c: %i\n", c);

c++ - 如何从 8 位整数中获得大于 8 位的值？

代码

输出

在ideone上查看。

9 回答 9

Related

Reference