当我在做一个大学项目时,我使用了一个年长学生制作的项目内部分析器,它非常基本但足够好,因为它的任务是减去代码两点之间的时间并提供统计信息。
现在,专业的分析器是如何工作的?它是否预处理代码以插入检查点或类似的东西?它是否读取带有调试数据的二进制代码以捕获调用函数的位置?
谢谢。
当我在做一个大学项目时,我使用了一个年长学生制作的项目内部分析器,它非常基本但足够好,因为它的任务是减去代码两点之间的时间并提供统计信息。
现在,专业的分析器是如何工作的?它是否预处理代码以插入检查点或类似的东西?它是否读取带有调试数据的二进制代码以捕获调用函数的位置?
谢谢。
有许多不同的分析器以不同的方式工作。
Commonly used profilers simply examine the running program regularly to see what assembly instruction is currently being executed (the program counter) and which routines called the current function (the call stack). This kind of sampling profiler can work with standard binaries, but are more useful if you have debugging symbols to work out lines of code given addresses in the program.
As well as sampling regularly, you can also use processor performance counters to sample after a certain number of events such as cache misses, which will help you see which parts of your program are slowing down due to memory accesses.
Other profilers involve recompiling the program to insert instructions (known as instrumentation) to count how often each continuous set of instructions (basic blocks) are executed, or maybe even record the sequence in which basic blocks are executed, or record the content of variables at certain places.
The instrumentation approach can give you all the precision and data you might want, but will slow down the program and that will change its performance characteristics. By contrast, with sampling based approaches you can tune the performance impact against the length of time you need to run the program against the accuracy of the profile data you obtain.
有两种常见的分析策略(无论如何都是基于 VM 的语言):检测和采样。
每次方法启动和完成时,仪器都会插入检查点并通知分析器。这可以通过 JIT/解释器或通过后正常编译但预执行阶段完成,该阶段仅更改可执行文件。这会对性能产生非常显着的影响(从而扭曲任何时序结果)。不过,这有利于获得准确的计数。
采样会定期向 VM 询问所有线程的堆栈跟踪是什么样的,并以这种方式更新其统计信息。这通常对性能的影响较小,但会产生不太准确的调用计数。
As Jon Skeet wrote above there are two strategies: instrumentation and sampling.
Instrumentation is done both manually and also automatically. In manual case: the developer manually inserts code to track the start/end of a region of code of interest. For example a simple "StartTimer" and "EndTimer". Some profiler tools can do this automatically also - for this the profiler will need to do a static analysis of the code i.e. it parses out the code and identify important checkpoints like the start/end of a particular method(s). This is most easy with languages that support reflection (e.g. any .net language). Using 'reflection' the profiler is able to rebuild the entire source code tree (along with call graphs).
Sampling is done by the profiler and it looks into the binary code. The profiler can also techniques like Hooks or trap Windows events/ messages for the purpose of profiling.
Both Instrumentation and sampling methods have their own overheads. The amount of overhead depends - e.g. if the sampling frequency is set to high values, then the profiling itself can contribute significantly to the performance being reported.
Instrumentation Vs Sampling: It is not like one is better than the other approach. Both have their place.
The best approach is to start with a sampling based profiler and look at the whole system level. That is run the sampler and see the system wide resource usage: memory, hard disk, network, CPU.
From the above identify the resources that are getting choked.
With the above info, you can now add instrumentation to your code to pin-point the culprit. For example if memory is the most used resource then it will help to instrument your memory allocation related code. Note that with instrumentation you are really concentrating on a particular area of your code.
对于 *nix 中的 gprof,在编译和链接时使用 -pg,将一些额外的代码注入到目标代码中。然后通过运行gprof,注入的代码会生成一个报告文件。