为什么linux内核会在堆栈溢出时产生段错误?当 c 中的 alloca 或 fortran 创建的临时数组溢出时,这会使调试变得非常尴尬。当然,运行时可能会产生更有用的错误。
6 回答
您实际上可以使用信号处理程序捕获堆栈溢出的条件。
为此,您必须做两件事:
使用 sigaction 为 SIGSEGV(segfault)设置信号处理程序,为此设置 SO_ONSTACK 标志。这指示内核在传递信号时使用替代堆栈。
调用 sigaltstack() 来设置 SIGSEGV 的处理程序将使用的备用堆栈。
然后,当您溢出堆栈时,内核将在传递信号之前切换到您的备用堆栈。在您的信号处理程序中,您可以检查导致错误的地址并确定它是堆栈溢出还是常规错误。
The "kernel" (it's actually not the kernel running your code, it's the CPU) doesn't know how your code is referencing the memory it's not supposed to be touching. It only knows that you tried to do it.
The code:
char *x = alloca(100);
char y = x[150];
can't really be evaluated by the CPU as you trying to access beyond the bounds of x.
You may hit the exact same address with:
char y = *((char*)(0xdeadbeef));
BTW, I would discourage the use of alloca since stack tends to be much more limited than heap (use malloc instead).
A stack overflow is a segmentation fault. As in you've broken the given bounds of memory that the you were initially allocated. The stack of of finite size, and you have exceeded it. You can read more about it at wikipedia
Additionally, one thing I've done for projects in the past is write my own signal handler to segfault (look at man page signal (2)). I usually caught the signal and wrote out "Fatal error has occured" to the console. I did some further stuff with checkpoint flags, and debugging.
In order to debug segfaults you can run a program in GDB. For example, the following C program will segfault: #segfault.c #include #include
int main()
{
printf("Starting\n");
void *foo=malloc(1000);
memcpy(foo, 0, 100); //this line will segfault
exit(0);
}
If I compile it like so:
gcc -g -o segfault segfault.c
and then run it like so:
$ gdb ./segfault
GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) run
Starting program: /tmp/segfault
Starting
Program received signal SIGSEGV, Segmentation fault.
0x4ea43cbc in memcpy () from /lib/libc.so.6
(gdb) bt
#0 0x4ea43cbc in memcpy () from /lib/libc.so.6
#1 0x080484cb in main () at segfault.c:8
(gdb)
I find out from GDB that there was a segmentation fault on line 8. Of course there are more complex ways of handling stack overflows and other memory errors, but this will suffice.
只需使用Valgrind。它会以极其精确的方式指出你所有的内存分配错误。
一些评论很有帮助,但问题不在于内存分配错误。那就是代码没有错误。在 fortran 中,运行时在堆栈上分配临时值是一件很麻烦的事情。因此,诸如 write(fp)x,y,z 之类的命令可以触发是没有警告的段错误。英特尔 Fortran 编译器的技术支持人员表示,运行时库无法打印出更有帮助的消息。但是,如果 Miguel 是正确的,那么这应该是可能的,正如他所建议的那样。非常感谢。剩下的问题是我如何首先找到段错误的地址,并确定它是来自堆栈溢出还是其他问题。
对于发现此问题的其他人,有一个编译器标志可以将临时变量放在堆上的某个大小之上。
堆栈溢出不一定会导致崩溃。它可能会默默地丢弃程序的数据,但会继续执行。
我不会使用 SIGSEGV 处理程序 kludges 而是解决原始问题。
如果你想要自动帮助,你可以使用 gcc 的 -Wstack-protector 选项,它会在运行时发现一些溢出并中止程序。
valgrind 适用于动态内存分配错误,但不适用于堆栈错误。