1

这是一个简单的程序,我粘贴了为 x86_64 生成的程序集以及 C 源代码。

int main()
{
  4004b4:   55                      push   %rbp
  4004b5:   48 89 e5                mov    %rsp,%rbp
    int array[10];

    array[0] = 5;
  4004b8:   c7 45 d0 05 00 00 00    movl   $0x5,-0x30(%rbp)

    return 0;
  4004bf:   b8 00 00 00 00          mov    $0x0,%eax
}

我知道 IDA-pro、dcc 等程序反编译工具,但我不知道这些程序如何计算出数组边界等细节。更一般地说,有没有办法弄清楚只看组装

movl $0x5,-0x30(%rbp)实际上是对int array[10]吗?我可以看到,如果程序是用-g带有调试信息的 ie 编译的,那么objdump确实会显示源代码,我们可以弄清楚。当二进制文件缺少调试细节时,商业反编译器如何解决这个问题?

4

1 回答 1

0

I don't think you can figure that out in your example. There's too little code in that function.

If it were a bigger function that used the array multiple times you might find some tips pointing to that. Like base address + different offset popping in and out through out the generated machine code.

Weak assumption:

for (i = 0; i < 10; i++)
        array[i] = i * 2;

This would allow you to assume, by looking at the generated code, that you're dealing with an array of 10 ints.

Stronger case:

int *array = NULL;
array = malloc(10 * sizeof *array);
if (array == NULL)
        return ENOMEM;
for (i = 0; i < 10; i++)
        array[i] = i * 2;

This would make the fact that you're dealing with an array of 10 ints a certainty.

In your case you only have the raw information: the function allocated 10 * sizeof(int) bytes on the stack. (Which actually depends on the optimizer as well, but that's another topic).

So it's all about the heuristics and code pattern recognition algorithms that programs like IDA use to feed you as much reliable information as possible.

The rest is up to the reverser's experience.

于 2012-08-07T18:23:36.890 回答