在具有 ARM A9 处理器、L2CACHE、SRAM 的系统上。是否有可能有一个 C 程序来获得以下性能数据:
- 平均 SRAM 数据获取延迟。
- 平均 指令获取延迟。
If you have hardware targets to run and measure on, you could create test code that can get cycle counts between different points of its execution, using Cortex-A9 PMU (ref A9 TRM chapter 11). Your test code would need to initialize and read from PMU registers. Then, PMU will measure cycle count and give other interesting data e.g. number of cache misses. That much is doable with software.
However, that resulting performance data may not be as low-level as you may want.
Consider a loop over a block of NOP instructions, with loop counter in a register. L1 instruction cache would fill on the first iteration. PMU can give you a measurement of instruction cycles and total time. That measurement would relate to L1 instruction fetch delay (unless you use a really big block, in which case you might shed light on L2).
Similarly you could construct test code whose execution time will also include effect of data fetch delay.
There is ARM example code which shows how PMU can be used.
You may find processor internals to be complicated. If L2 is your primary interest, the controller e.g. L2C-310 may have its own event counters, although I haven't used such.