c - 应如何为 x86_64 定义 [u]int_fastN_t 类型，无论是否使用 x32 ABI？

Question

x32 ABI为 x86_64 体系结构生成的代码指定了32 位指针等。它结合了 x86_64 架构（包括 64 位 CPU 寄存器）的优点和 32 位指针的减少开销。

标<stdint.h>头定义了 typedefs int_fast8_t、int_fast16_t、int_fast32_t和int_fast64_t（以及相应的无符号类型uint_fast8_t等），其中每一个是：

在至少具有指定宽度的所有整数类型中，通常使用最快的整数类型

带脚注：

不保证指定的类型在所有用途中都是最快的；如果实现没有明确的理由选择一种类型而不是另一种，它将简单地选择一些满足符号和宽度要求的整数类型。

（引自N1570 C11 草案。）

问题是，无论有没有 x32 ABI，应该如何为 x86_64 架构定义类型[u]int_fast16_t和类型？[u]int_fast32_t是否有指定这些类型的 x32 文档？它们是否应该与 32 位 x86 定义（均为 32 位）兼容，或者，由于 x32 可以访问 64 位 CPU 寄存器，它们是否应该在有或没有 x32 ABI 的情况下具有相同的大小？（请注意，无论 x32 ABI 是否在使用中，x86_64 都有 64 位寄存器。）

这是一个测试程序（取决于 gcc 特定的__x86_64__宏）：

#include <stdio.h>
#include <stdint.h>
#include <limits.h>

int main(void) {
#if defined __x86_64__ && SIZE_MAX == 0xFFFFFFFF
    puts("This is x86_64 with the x32 ABI");
#elif defined __x86_64__ && SIZE_MAX > 0xFFFFFFFF
    puts("This is x86_64 without the x32 ABI");
#else
    puts("This is not x86_64");
#endif
    printf("uint_fast8_t  is %2zu bits\n", CHAR_BIT * sizeof (uint_fast8_t));
    printf("uint_fast16_t is %2zu bits\n", CHAR_BIT * sizeof (uint_fast16_t));
    printf("uint_fast32_t is %2zu bits\n", CHAR_BIT * sizeof (uint_fast32_t));
    printf("uint_fast64_t is %2zu bits\n", CHAR_BIT * sizeof (uint_fast64_t));
}

当我用编译它时gcc -m64，输出是：

This is x86_64 without the x32 ABI
uint_fast8_t  is  8 bits
uint_fast16_t is 64 bits
uint_fast32_t is 64 bits
uint_fast64_t is 64 bits

当我用编译它时gcc -mx32，输出是：

This is x86_64 with the x32 ABI
uint_fast8_t  is  8 bits
uint_fast16_t is 32 bits
uint_fast32_t is 32 bits
uint_fast64_t is 64 bits

（除了第一行之外，它与输出匹配gcc -m32，生成 32 位 x86 代码）。

这是 glibc 中的错误（它定义了<stdint.h>标头），还是遵循某些 x32 ABI 要求？在x32 ABI 文档或x86_64 ABI 文档中都没有对[u]int_fastN_t类型的引用，但可能有其他东西指定它。

有人可能会争辩说，fast16 和 fast32 类型应该是带有或带有 x32 的 64 位，因为 64 位寄存器可用；这会比当前的行为更有意义吗？

（我已经对原始问题进行了实质性编辑，该问题仅询问了 x32 ABI。现在问题询问了带有或不带有 x32 的 x86_64。）

score 1 · Accepted Answer

一般来说，您会期望 32 位整数类型在 x86-64 CPU 上比 64 位整数类型稍微快一些。部分是因为它们使用较少的内存，但也因为 64 位指令比 32 位指令需要一个额外的前缀字节。32 位除法指令明显快于 64 位除法指令，但其他指令执行延迟是相同的。

在将它们加载到 64 位寄存器时，通常不需要扩展 32 位。虽然在这种情况下 CPU 会自动对值进行零扩展，但这通常只是一个好处，因为它避免了部分寄存器停顿。加载到寄存器上部的内容不如修改整个寄存器这一事实重要。寄存器上部的内容无关紧要，因为当它们用于保存 32 位类型时，它们通常仅用于 32 位指令，这些指令仅适用于寄存器的低 32 位部分。

使用 x32 和 x86-64 ABI 时类型大小之间的不一致int_fast32_t可能最好通过指针为 64 位宽这一事实来证明。每当将 32 位整数添加到指针时，都需要对其进行扩展，这使得在使用 x86-64 ABI 时更有可能发生这种情况。

另一个需要考虑的因素是 x32 ABI 的重点是通过使用更小的类型来获得更好的性能。任何受益于指针和相关类型变小的应用程序也应该受益于int_fast32_t变小。

score 0 · Accepted Answer

我已经编译了以下示例代码来检查生成的代码以获取具有不同整数类型的简单总和：

#include <stdint.h>

typedef int16_t INT;
//typedef int32_t INT;
//typedef int64_t INT;

INT foo()
{
    volatile INT a = 1, b = 2;
    return a + b;
}

然后我反汇编了每种整数类型生成的代码。编译命令是gcc -Ofast -mx32 -c test.c. 请注意，在完整的 64 位模式下，生成的代码几乎相同，因为我的代码中没有指针（仅%rsp代替%esp）。

随之int16_t发出：

00000000 <foo>:
   0:   b8 01 00 00 00          mov    $0x1,%eax
   5:   ba 02 00 00 00          mov    $0x2,%edx
   a:   67 66 89 44 24 fc       mov    %ax,-0x4(%esp)
  10:   67 66 89 54 24 fe       mov    %dx,-0x2(%esp)
  16:   67 0f b7 54 24 fc       movzwl -0x4(%esp),%edx
  1c:   67 0f b7 44 24 fe       movzwl -0x2(%esp),%eax
  22:   01 d0                   add    %edx,%eax
  24:   c3                      retq

与int32_t：

00000000 <foo>:
   0:   67 c7 44 24 f8 01 00 00 00  movl   $0x1,-0x8(%esp)
   9:   67 c7 44 24 fc 02 00 00 00  movl   $0x2,-0x4(%esp)
  12:   67 8b 54 24 f8              mov    -0x8(%esp),%edx
  17:   67 8b 44 24 fc              mov    -0x4(%esp),%eax
  1c:   01 d0                       add    %edx,%eax
  1e:   c3                          retq

并与int64_t：

00000000 <foo>:
   0:   67 48 c7 44 24 f0 01 00 00 00   movq   $0x1,-0x10(%esp)
   a:   67 48 c7 44 24 f8 02 00 00 00   movq   $0x2,-0x8(%esp)
  14:   67 48 8b 54 24 f0               mov    -0x10(%esp),%rdx
  1a:   67 48 8b 44 24 f8               mov    -0x8(%esp),%rax
  20:   48 01 d0                        add    %rdx,%rax
  23:   c3                              retq

现在，我并没有声称确切地知道为什么编译器会准确生成这段代码（也许volatile关键字与非寄存器大小的整数类型相结合不是最佳选择？）。但从生成的代码中，我们可以得出以下结论：

最慢的类型是int16_t. 它需要额外的指令来移动这些值。
最快的类型是int32_t. 虽然 32 位和 64 位版本的指令数相同，但 32 位代码的字节数更短，因此对缓存更友好，因此速度更快。

所以快速类型的自然选择是：

对于int_fast16_t，选择int32_t。
对于int_fast32_t，选择int32_t。
对于int_fast64_t，选择int64_t（还有什么）。

score -3 · Accepted Answer

艰难的。让我们以 int_fast8_t 为例。如果开发人员使用一个大数组来存储大量 8 位有符号整数，那么 int8_t 将是最快的，因为有缓存。我会声明，使用大型 int_fast8_t 数组可能是个坏主意。

您需要使用大型代码库，并系统地替换 int8_t 和已签名的字符以及使用 int_fast8_t 签名的普通字符。然后为 int_fast8_t 使用不同的 typedef 对代码进行基准测试，并测量什么是最快的。

请注意，未定义的行为将会改变。例如，如果类型为 int8_t，则分配 255 将给出 -1 的结果，否则为 255。

c - 应如何为 x86_64 定义 [u]int_fastN_t 类型，无论是否使用 x32 ABI？

3 回答 3

Related

Reference