Ubuntu 20.04 glibc 2.31 RTFS + GDB
glibc 在 main 之前做了一些设置,以便它的一些功能可以工作。让我们尝试追踪它的源代码。
你好ç
#include <stdio.h>
int main() {
puts("hello");
return 0;
}
编译和调试:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o hello.out hello.c
gdb hello.out
现在在 GDB 中:
b main
r
bt -past-main
给出:
#0 main () at hello.c:3
#1 0x00007ffff7dc60b3 in __libc_start_main (main=0x555555555149 <main()>, argc=1, argv=0x7fffffffbfb8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffbfa8) at ../csu/libc-start.c:308
#2 0x000055555555508e in _start ()
这已经包含 main 调用者的行:https://github.com/cirosantilli/glibc/blob/glibc-2.31/csu/libc-start.c#L308。
从 glibc 的遗留/通用性级别可以预期,该函数有十亿个 ifdef,但一些似乎对我们生效的关键部分应该简化为:
# define LIBC_START_MAIN __libc_start_main
STATIC int
LIBC_START_MAIN (int (*main) (int, char **, char **),
int argc, char **argv,
{
/* Initialize some stuff. */
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
exit (result);
}
Before __libc_start_main
are already at _start
,通过添加gcc -Wl,--verbose
我们知道它是入口点,因为链接描述文件包含:
ENTRY(_start)
因此是动态加载程序完成后执行的实际第一条指令。
为了确认在 GDB 中,我们通过编译来摆脱动态加载器-static
:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o hello.out hello.c
gdb hello.out
然后让GDB 在执行的第一条指令处停止starti
并打印第一条指令:
starti
display/12i $pc
这使:
=> 0x401c10 <_start>: endbr64
0x401c14 <_start+4>: xor %ebp,%ebp
0x401c16 <_start+6>: mov %rdx,%r9
0x401c19 <_start+9>: pop %rsi
0x401c1a <_start+10>: mov %rsp,%rdx
0x401c1d <_start+13>: and $0xfffffffffffffff0,%rsp
0x401c21 <_start+17>: push %rax
0x401c22 <_start+18>: push %rsp
0x401c23 <_start+19>: mov $0x402dd0,%r8
0x401c2a <_start+26>: mov $0x402d30,%rcx
0x401c31 <_start+33>: mov $0x401d35,%rdi
0x401c38 <_start+40>: addr32 callq 0x4020d0 <__libc_start_main>
通过查找源代码_start
并关注 x86_64 命中,我们看到这似乎对应于sysdeps/x86_64/start.S:58
:
ENTRY (_start)
/* Clearing frame pointer is insufficient, use CFI. */
cfi_undefined (rip)
/* Clear the frame pointer. The ABI suggests this be done, to mark
the outermost frame obviously. */
xorl %ebp, %ebp
/* Extract the arguments as encoded on the stack and set up
the arguments for __libc_start_main (int (*main) (int, char **, char **),
int argc, char *argv,
void (*init) (void), void (*fini) (void),
void (*rtld_fini) (void), void *stack_end).
The arguments are passed via registers and on the stack:
main: %rdi
argc: %rsi
argv: %rdx
init: %rcx
fini: %r8
rtld_fini: %r9
stack_end: stack. */
mov %RDX_LP, %R9_LP /* Address of the shared library termination
function. */
#ifdef __ILP32__
mov (%rsp), %esi /* Simulate popping 4-byte argument count. */
add $4, %esp
#else
popq %rsi /* Pop the argument count. */
#endif
/* argv starts just at the current stack top. */
mov %RSP_LP, %RDX_LP
/* Align the stack to a 16 byte boundary to follow the ABI. */
and $~15, %RSP_LP
/* Push garbage because we push 8 more bytes. */
pushq %rax
/* Provide the highest stack address to the user code (for stacks
which grow downwards). */
pushq %rsp
#ifdef PIC
/* Pass address of our own entry points to .fini and .init. */
mov __libc_csu_fini@GOTPCREL(%rip), %R8_LP
mov __libc_csu_init@GOTPCREL(%rip), %RCX_LP
mov main@GOTPCREL(%rip), %RDI_LP
#else
/* Pass address of our own entry points to .fini and .init. */
mov $__libc_csu_fini, %R8_LP
mov $__libc_csu_init, %RCX_LP
mov $main, %RDI_LP
#endif
/* Call the user's main function, and exit with its value.
But let the libc call main. Since __libc_start_main in
libc.so is called very early, lazy binding isn't relevant
here. Use indirect branch via GOT to avoid extra branch
to PLT slot. In case of static executable, ld in binutils
2.26 or above can convert indirect branch into direct
branch. */
call *__libc_start_main@GOTPCREL(%rip)
最终__libc_start_main
按预期调用。
不幸-static
的是,bt
从main
没有显示尽可能多的信息:
#0 main () at hello.c:3
#1 0x0000000000402560 in __libc_start_main ()
#2 0x0000000000401c3e in _start ()
如果我们删除-static
并从 开始starti
,我们会得到:
=> 0x7ffff7fd0100 <_start>: mov %rsp,%rdi
0x7ffff7fd0103 <_start+3>: callq 0x7ffff7fd0df0 <_dl_start>
0x7ffff7fd0108 <_dl_start_user>: mov %rax,%r12
0x7ffff7fd010b <_dl_start_user+3>: mov 0x2c4e7(%rip),%eax # 0x7ffff7ffc5f8 <_dl_skip_args>
0x7ffff7fd0111 <_dl_start_user+9>: pop %rdx
通过 grepping 来源_dl_start_user
似乎来自sysdeps/x86_64/dl-machine.h:L147
/* Initial entry point code for the dynamic linker.
The C function `_dl_start' is the real entry point;
its return value is the user program's entry point. */
#define RTLD_START asm ("\n\
.text\n\
.align 16\n\
.globl _start\n\
.globl _dl_start_user\n\
_start:\n\
movq %rsp, %rdi\n\
call _dl_start\n\
_dl_start_user:\n\
# Save the user entry point address in %r12.\n\
movq %rax, %r12\n\
# See if we were run as a command with the executable file\n\
# name as an extra leading argument.\n\
movl _dl_skip_args(%rip), %eax\n\
# Pop the original argument count.\n\
popq %rdx\n\
这大概是动态加载程序的入口点。
如果我们中断_start
并继续,这似乎最终在与我们使用时相同的位置-static
,然后调用__libc_start_main
.
当我尝试使用 C++ 程序时:
你好.cpp
#include <iostream>
int main() {
std::cout << "hello" << std::endl;
}
和:
g++ -ggdb3 -O0 -std=c++11 -Wall -Wextra -pedantic -o hello.out hello.cpp
结果基本相同,例如,回溯处main
完全相同。
我认为 C++ 编译器只是调用钩子来实现任何 C++ 特定功能,并且在 C/C++ 中都很好地考虑了这些因素。
去做: