当我使用 valgrind 执行相同的任务时,它给了我 97487 的答案,几乎等于前一个。但是当我使用 perf 时,答案是 421,256。各种工具之间存在这种差异的原因是什么?
我的猜测是,它perf
为您提供了用户模式和内核模式指令(这是默认设置)。请试试
perf stat -e instructions:u your_executable
这应该只计算在用户模式下执行的指令。性能教程中的更多详细信息。
To find more details I have compiled the C program into a x86 assembly
and it consists about 20-30 lines of assembly instructions, But when I
used objdump to disassemble the binary it was result in a 200-300
lines of assembly instructions. I was not able to figure out the
reason for this difference too.
In the first case you only get the assembly instructions exclusively for your code. In the second case you get all the instructions contained in the executable. Please compile
int main() { }
and run objdump -d name_of_the_executable
. As you will see, many things happen before the main()
is executed; and after main()
has finished, a clean-up is executed.
Linux x86 Program Start Up or - How the heck do we get to main()? seems like a nice tutorial.