2
void myFunc(char dummy) {
    char *addrFirstArg = &dummy;
}

int main() {
    char dummy = 42;
    myFunc(dummy);
    return 0;
}

我在 gdb 下运行上述代码并在 myFunc 处添加断点。我一步一步计算 addrFirstArg 值并检查它。

我也这样做

信息框
吐出有关框架 myFunc 的信息。就我对 C 堆栈实现的理解而言,我希望 addrFirstArg 应该比帧 myFunc 的基指针高 8 个字节。

这是我看到的输出:

(gdb) p &dummy
$1 = 0xffffd094 "*\202\f\b\032\004"

(gdb) info frame
Stack level 0, frame at 0xffffd0b0:
 eip = 0x8048330 in findStackBottom (reporter.c:64); saved eip 0x8048478
 called by frame at 0xffffd170
 source language c.
 Arglist at 0xffffd0a8, args: dummy=42 '*'
 Locals at 0xffffd0a8, Previous frame's sp is 0xffffd0b0
 Saved registers:
 ebp at 0xffffd0a8, eip at 0xffffd0ac

(gdb) x/1c 0xffffd0b0
0xffffd0b0:     42 'a'

因此,在 myFunc 帧内,ebp 指向位置 0xffffd0a8,其中 dummy 的地址是 0xffffd094,它比 ebp 低 0x14 个字节,而不是高于它的 0x8 个字节。

如果我声明我的虚拟对象是一个 int 并且 myFunc 接受一个 int 参数,那么这种“差异”就会消失。

I'm really intrigued by this behavior. It was reproducible - I ran it a bunch of times.

4

1 回答 1

2

You see the differences better if you use gcc -S; in the char case we have

char case                       int case (diffs)

pushl   %ebp
movl    %esp, %ebp
subl    $20, %esp               subl    $16, %esp
movl    8(%ebp), %eax           x
movb    %al, -20(%ebp)          x
leal    -20(%ebp), %eax         leal    8(%ebp), %eax
movl    %eax, -4(%ebp)
leave
ret

When the function is entered, the stack is (top on top):

esp     return address
esp+4   2A 00 00 00

This is because the single char is "pushed" on the stack this way

movsbl  -1(%ebp), %eax
movl    %eax, (%esp)

and the x86 is little endian.

After the "preamble" the situation is like this

esp            (room for local char dummy - byte 42) ...
...
ebp-4          room for char *
esp+20 = ebp   ebp
ebp+4          return addr
ebp+8          2A 00 00 00       

The "char" (stored as 32 bit integer) is then taken from ebp+8 (the original value "pushed" by the main, but as "32 bit") to eax and then the lower less significant byte is put in a local storage.

The int case is simpler since we don't need alignments and we can take "directly" the address of whatever was on the stack.

esp             ...
...
ebp-4          room for int *
esp+16 = ebp   ebp
ebp+4          return addr
ebp+8          2A 00 00 00       

So, in the first case (the char case), esp is decremented by 4 more bytes to hold the single char: there's an extra local storage.

Why this?

As you have seen, the single char is pushed on stack as a 32bit "integer" (eax), and it is taken back in eax in the same way. This opeartion has no endianness problem.

But, what if it would give back the address of ebp+8 for the char and the machine is no little endian? In that case, ebp+8 points to 00 00 00 2A and deferencing with *dummy would give 0, not 42.

So, once the "fake int" is taken (operation that the CPU handles coherently whatever the endianness is) into a register, the LSByte must be put in a local storage so that its address is guaranteed to point to that char (lower byte) when deferenced. This is the reason for the extra code and the fact that the ebp+8 is not used: endianness altogether with the requirements of the address being aligned (e.g. the 2A in 00 00 00 2A in the big endian case would have an odd address.

于 2012-05-26T08:43:58.027 回答