linux - i386 和 x86-64 上 UNIX 和 Linux 系统调用（和用户空间函数）的调用约定是什么

Question

以下链接解释了 UNIX（BSD 风格）和 Linux 的 x86-32 系统调用约定：

但是 UNIX 和 Linux 上的 x86-64 系统调用约定是什么？

score 275 · Accepted Answer

进一步阅读此处的任何主题：Linux 系统调用权威指南

我在 Linux 上使用 GNU Assembler (gas) 验证了这些。

内核接口

x86-32 又名 i386 Linux 系统调用约定：

在 x86-32 中，Linux 系统调用的参数是使用寄存器传递的。%eax对于 syscall_number。%ebx, %ecx, %edx, %esi, %edi, %ebp 用于将 6 个参数传递给系统调用。

返回值在%eax. 所有其他寄存器（包括 EFLAGS）都保留在int $0x80.

我从Linux Assembly Tutorial中获取了以下片段，但我对此表示怀疑。如果有人能举个例子，那就太好了。

如果有超过六个参数，则 %ebx必须包含存储参数列表的内存位置 - 但不要担心这一点，因为您不太可能使用超过六个参数的系统调用。

有关示例和更多阅读内容，请参阅http://www.int80h.org/bsdasm/#alternate-calling-convention。另一个使用 i386 Linux 的 Hello World 示例int 0x80：Hello, world in assembly language with Linux system calls？

有一种更快的方法来进行 32 位系统调用：使用sysenter. 内核将一页内存映射到每个进程（vDSO）中，用户空间方面的sysenter舞蹈必须与内核合作才能找到返回地址。用于注册映射的 Arg 与 for 相同int $0x80。您通常应该调用 vDSO 而不是sysenter直接使用。（有关链接和调用 vDSO 的信息，请参阅Linux 系统调用权威指南sysenter，以及有关的更多信息，以及与系统调用有关的所有其他内容。）

x86-32 [Free|Open|Net|DragonFly]BSD UNIX 系统调用约定：

参数在堆栈上传递。将参数（最先推送的最后一个参数）压入堆栈。然后压入额外的 32 位虚拟数据（它实际上不是虚拟数据。有关更多信息，请参阅以下链接），然后给出系统调用指令int $0x80

http://www.int80h.org/bsdasm/#default-calling-convention

x86-64 Linux 系统调用约定：

（注意：x86-64 Mac OS X 与 Linux 相似但不同。TODO：检查 *BSD 的作用）

请参阅System V Application Binary Interface AMD64 Architecture Processor Supplement的“A.2 AMD64 Linux Kernel Conventions”部分。i386 和 x86-64 System V psABI 的最新版本可以从 ABI 维护者的 repo 中的此页面链接找到。（另请参阅x86标签 wiki 以获取最新的 ABI 链接和许多其他关于 x86 asm 的好东西。）

这是本节的片段：

用户级应用程序用作整数寄存器，用于传递序列 %rdi、%rsi、%rdx、%rcx、%r8 和 %r9。内核接口使用 %rdi、%rsi、%rdx、%r10、%r8 和 %r9。

系统调用是通过syscall指令完成的。这会破坏 %rcx 和 %r11以及 %rax 返回值，但会保留其他寄存器。

系统调用的编号必须在寄存器 %rax 中传递。

系统调用仅限于六个参数，没有参数直接在堆栈上传递。

从系统调用返回，寄存器 %rax 包含系统调用的结果。介于 -4095 和 -1 之间的值表示错误，即-errno.

只有 INTEGER 类或 MEMORY 类的值被传递给内核。

请记住，这是来自 ABI 的特定于 Linux 的附录，即使对于 Linux，它也是信息性而非规范性的。（但实际上它是准确的。）

此 32 位int $0x80ABI可用于 64 位代码（但强烈不推荐）。如果在 64 位代码中使用 32 位 int 0x80 Linux ABI 会发生什么？它仍然将其输入截断为 32 位，因此它不适合指针，并将 r8-r11 归零。

用户界面：函数调用

x86-32 函数调用约定：

在 x86-32 中，参数在堆栈上传递。最后一个参数首先被压入堆栈，直到所有参数都完成，然后call执行指令。这用于从程序集调用 Linux 上的 C 库 (libc) 函数。

%esp现代版本的 i386 System V ABI（在 Linux 上使用）需要a 之前的16 字节对齐call，就像 x86-64 System V ABI 一直需要的那样。被调用者可以假设并使用 SSE 16 字节加载/存储未对齐的错误。但从历史上看，Linux 只需要 4 字节的堆栈对齐，因此即使为 8 字节double或其他东西保留自然对齐的空间也需要额外的工作。

其他一些现代 32 位系统仍然不需要超过 4 字节的堆栈对齐。

x86-64 System V 用户空间函数调用约定：

x86-64 System V 在寄存器中传递 args，这比 i386 System V 的堆栈 args 约定更有效。它避免了将 args 存储到内存（缓存）然后在被调用者中再次加载它们的延迟和额外指令。这很有效，因为有更多可用的寄存器，并且对于延迟和乱序执行很重要的现代高性能 CPU 来说更好。（i386 ABI 很老了）。

在这种新机制中：首先将参数划分为类。每个参数的类决定了它传递给被调用函数的方式。

有关完整信息，请参阅：System V Application Binary Interface AMD64 Architecture Processor Supplement的“3.2 函数调用序列”，其中部分内容如下：

一旦参数被分类，寄存器被分配（按从左到右的顺序）用于传递，如下所示：

如果类是 MEMORY，则在堆栈上传递参数。

如果类是 INTEGER，则使用序列 %rdi、%rsi、%rdx、%rcx、%r8 和 %r9 的下一个可用寄存器

用于将整数/指针（即 INTEGER 类）参数传递给程序集中的任何 libc 函数的%rdi, %rsi, %rdx, %rcx, %r8 and %r9寄存器也是如此。%rdi 用于第一个 INTEGER 参数。%rsi 代表第二个，%rdx 代表第三个，依此类推。然后应该给出指示。堆栈 ( ) 在执行时必须是 16B 对齐的。call%rspcall

如果 INTEGER 参数超过 6 个，则将第 7 个及以后的 INTEGER 参数传递到堆栈上。（调用者弹出，与 x86-32 相同。）

前 8 个浮点参数在 %xmm0-7 中传递，稍后在堆栈中。没有调用保留向量寄存器。（一个混合了 FP 和整数参数的函数可以有超过 8 个寄存器参数。）

可变参数函数（likeprintf）总是需要%al= FP 寄存器参数的数量。

对于何时将结构打包到寄存器（rdx:rax返回时）与内存中，有一些规则。有关详细信息，请参阅 ABI，并检查编译器输出以确保您的代码与编译器关于如何传递/返回某些内容一致。

请注意，Windows x64 函数调用约定与 x86-64 System V 有多个显着差异，例如必须由调用者保留的阴影空间（而不是红色区域）和调用保留的 xmm6-xmm15。arg 进入哪个寄存器的规则非常不同。

score 15 · Accepted Answer

也许您正在寻找 x86_64 ABI？

www.x86-64.org/documentation/abi.pdf (404 at 2018-11-24)
www.x86-64.org/documentation/abi.pdf（通过 Wayback Machine 在 2018-11-24）
x86-64 System V ABI 记录在哪里？- https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI保持最新（由 ABI 维护者之一 HJ Lu 提供），并带有指向当前官方版本 PDF 的链接。

如果这不是您所追求的，请在您首选的搜索引擎中使用“x86_64 abi”来查找替代参考。

score 15 · Accepted Answer

Linux kernel 5.0 源码注释

我知道 x86 的细节在下面arch/x86，系统调用的东西也在下面arch/x86/entry。所以git grep rdi在那个目录中快速引导我到arch/x86/entry/entry_64.S：

/*
 * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
 *
 * This is the only entry point used for 64-bit system calls.  The
 * hardware interface is reasonably well designed and the register to
 * argument mapping Linux uses fits well with the registers that are
 * available when SYSCALL is used.
 *
 * SYSCALL instructions can be found inlined in libc implementations as
 * well as some other programs and libraries.  There are also a handful
 * of SYSCALL instructions in the vDSO used, for example, as a
 * clock_gettimeofday fallback.
 *
 * 64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
 * then loads new ss, cs, and rip from previously programmed MSRs.
 * rflags gets masked by a value from another MSR (so CLD and CLAC
 * are not needed). SYSCALL does not save anything on the stack
 * and does not change rsp.
 *
 * Registers on entry:
 * rax  system call number
 * rcx  return address
 * r11  saved rflags (note: r11 is callee-clobbered register in C ABI)
 * rdi  arg0
 * rsi  arg1
 * rdx  arg2
 * r10  arg3 (needs to be moved to rcx to conform to C ABI)
 * r8   arg4
 * r9   arg5
 * (note: r12-r15, rbp, rbx are callee-preserved in C ABI)
 *
 * Only called from user space.
 *
 * When user can change pt_regs->foo always force IRET. That is because
 * it deals with uncanonical addresses better. SYSRET has trouble
 * with them due to bugs in both AMD and Intel CPUs.
 */

对于 32 位的arch/x86/entry/entry_32.S：

/*
 * 32-bit SYSENTER entry.
 *
 * 32-bit system calls through the vDSO's __kernel_vsyscall enter here
 * if X86_FEATURE_SEP is available.  This is the preferred system call
 * entry on 32-bit systems.
 *
 * The SYSENTER instruction, in principle, should *only* occur in the
 * vDSO.  In practice, a small number of Android devices were shipped
 * with a copy of Bionic that inlined a SYSENTER instruction.  This
 * never happened in any of Google's Bionic versions -- it only happened
 * in a narrow range of Intel-provided versions.
 *
 * SYSENTER loads SS, ESP, CS, and EIP from previously programmed MSRs.
 * IF and VM in RFLAGS are cleared (IOW: interrupts are off).
 * SYSENTER does not save anything on the stack,
 * and does not save old EIP (!!!), ESP, or EFLAGS.
 *
 * To avoid losing track of EFLAGS.VM (and thus potentially corrupting
 * user and/or vm86 state), we explicitly disable the SYSENTER
 * instruction in vm86 mode by reprogramming the MSRs.
 *
 * Arguments:
 * eax  system call number
 * ebx  arg1
 * ecx  arg2
 * edx  arg3
 * esi  arg4
 * edi  arg5
 * ebp  user stack
 * 0(%ebp) arg6
 */

glibc 2.29 Linux x86_64 系统调用实现

现在让我们通过查看一个主要的 libc 实现来作弊，看看它们在做什么。

在我写这个答案时，有什么比查看我现在正在使用的 glibc 更好的呢？:-)

glibc 2.29 定义了 x86_64 系统调用，sysdeps/unix/sysv/linux/x86_64/sysdep.h其中包含一些有趣的代码，例如：

/* The Linux/x86-64 kernel expects the system call parameters in
   registers according to the following table:

    syscall number  rax
    arg 1       rdi
    arg 2       rsi
    arg 3       rdx
    arg 4       r10
    arg 5       r8
    arg 6       r9

    The Linux kernel uses and destroys internally these registers:
    return address from
    syscall     rcx
    eflags from syscall r11

    Normal function call, including calls to the system call stub
    functions in the libc, get the first six parameters passed in
    registers and the seventh parameter and later on the stack.  The
    register use is as follows:

     system call number in the DO_CALL macro
     arg 1      rdi
     arg 2      rsi
     arg 3      rdx
     arg 4      rcx
     arg 5      r8
     arg 6      r9

    We have to take care that the stack is aligned to 16 bytes.  When
    called the stack is not aligned since the return address has just
    been pushed.


    Syscalls of more than 6 arguments are not supported.  */

和：

/* Registers clobbered by syscall.  */
# define REGISTERS_CLOBBERED_BY_SYSCALL "cc", "r11", "cx"

#undef internal_syscall6
#define internal_syscall6(number, err, arg1, arg2, arg3, arg4, arg5, arg6) \
({                                  \
    unsigned long int resultvar;                    \
    TYPEFY (arg6, __arg6) = ARGIFY (arg6);              \
    TYPEFY (arg5, __arg5) = ARGIFY (arg5);              \
    TYPEFY (arg4, __arg4) = ARGIFY (arg4);              \
    TYPEFY (arg3, __arg3) = ARGIFY (arg3);              \
    TYPEFY (arg2, __arg2) = ARGIFY (arg2);              \
    TYPEFY (arg1, __arg1) = ARGIFY (arg1);              \
    register TYPEFY (arg6, _a6) asm ("r9") = __arg6;            \
    register TYPEFY (arg5, _a5) asm ("r8") = __arg5;            \
    register TYPEFY (arg4, _a4) asm ("r10") = __arg4;           \
    register TYPEFY (arg3, _a3) asm ("rdx") = __arg3;           \
    register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;           \
    register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;           \
    asm volatile (                          \
    "syscall\n\t"                           \
    : "=a" (resultvar)                          \
    : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4),     \
      "r" (_a5), "r" (_a6)                      \
    : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);            \
    (long int) resultvar;                       \
})

我觉得这很不言自明。请注意，这似乎是如何设计为与常规 System V AMD64 ABI 函数的调用约定完全匹配：https ://en.wikipedia.org/wiki/X86_calling_conventions#List_of_x86_calling_conventions

快速提醒clobbers：

cc表示标志寄存器。但Peter Cordes 评论说，这里没有必要这样做。
memory意味着可以在汇编中传递一个指针并用于访问内存

有关从头开始的显式最小可运行示例，请参见以下答案：如何通过 syscall 或 sysenter in inline assembly 调用系统调用？

手动在汇编中进行一些系统调用

不是很科学，但很有趣：

x86_64.S

.text
.global _start
_start:
asm_main_after_prologue:
    /* write */
    mov $1, %rax    /* syscall number */
    mov $1, %rdi    /* stdout */
    mov $msg, %rsi  /* buffer */
    mov $len, %rdx  /* len */
    syscall

    /* exit */
    mov $60, %rax   /* syscall number */
    mov $0, %rdi    /* exit status */
    syscall
msg:
    .ascii "hello\n"
len = . - msg

GitHub 上游.

从 C 进行系统调用

这是一个带有寄存器约束的示例：How to invoke a system call via syscall or sysenter in inline assembly?

aarch64

我在此处展示了一个最小的可运行用户空间示例：https : //reverseengineering.stackexchange.com/questions/16917/arm64-syscalls-table/18834#18834 TODO grep 内核代码，应该很容易。

score 12 · Accepted Answer

调用约定定义了在调用或被其他程序调用时参数如何在寄存器中传递。这些约定的最佳来源是为每个硬件定义的 ABI 标准。为了便于编译，用户空间和内核程序也使用相同的 ABI。Linux/Freebsd 在 x86-64 和 32 位遵循相同的 ABI。但是用于 Windows 的 x86-64 ABI 与 Linux/FreeBSD 不同。通常 ABI 不会区分系统调用与正常的“函数调用”。即，这里是 x86_64 调用约定的一个特定示例，它对于 Linux 用户空间和内核都是相同的：http: //eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64 /（注意参数的顺序 a、b、c、d、e、f）：