performance - 计算例程的速度？

Question

确定处理例程（例如函数过程）需要多长时间的最佳和最准确的方法是什么？

我问是因为我目前正在尝试优化我的应用程序中的一些功能，当我测试更改时，仅通过查看它很难确定是否有任何改进。因此，如果我可以返回处理例程所花费的准确或接近准确的时间，那么我就会更清楚地了解代码是否进行了任何更改。

我考虑过使用 GetTickCount，但我不确定这是否接近准确？

有一个可重复使用的函数/过程来计算例程的时间会很有用，并像这样使用它：

// < prepare for calcuation of code
...
ExecuteSomeCode; // < code to test
...
// < stop calcuating code and return time it took to process

我期待听到一些建议。

谢谢。

克雷格。

score 24 · Accepted Answer

据我所知，最准确的方法是使用QueryPerformanceFrequency：

代码：

var
  Freq, StartCount, StopCount: Int64;
  TimingSeconds: real;
begin
  QueryPerformanceFrequency(Freq);
  QueryPerformanceCounter(StartCount);
  // Execute process that you want to time: ...
  QueryPerformanceCounter(StopCount);
  TimingSeconds := (StopCount - StartCount) / Freq;
  // Display timing: ... 
end;

score 18 · Accepted Answer

18

试试 Eric Grange 的Sampling Profiler。

于 2011-05-17T12:20:44.820 回答

score 14 · Accepted Answer

从 Delphi 6 开始，您可以使用 x86 Timestamp 计数器。
这会计算 CPU 周期，在 1 Ghz 处理器上，每个计数需要一纳秒。
没有比这更准确的了。

function RDTSC: Int64; assembler;
asm
  // RDTSC can be executed out of order, so the pipeline needs to be flushed
  // to prevent RDTSC from executing before your code is finished.  
  // Flush the pipeline
  XOR eax, eax
  PUSH EBX
  CPUID
  POP EBX
  RDTSC  //Get the CPU's time stamp counter.
end;

在 x64 上，以下代码更准确，因为它不受CPUID.

  rdtscp        // On x64 we can use the serializing version of RDTSC
  push rbx      // Serialize the code after, to avoid OoO sneaking in
  push rax      // subsequent instructions prior to executing RDTSCP.
  push rdx      // See: http://www.intel.de/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
  xor eax,eax
  cpuid
  pop rdx
  pop rax
  pop rbx
  shl rdx,32
  or rax,rdx

使用上面的代码获取执行代码之前和之后的时间戳。
最准确的方法可能和容易的馅饼。

请注意，您需要至少运行 10 次测试才能获得良好的结果，第一次通过时缓存会变冷，随机硬盘读取和中断可能会影响您的计时。
因为这个东西非常准确，如果你只计时第一次运行，它可能会给你错误的想法。

如果 CPU 变慢，为什么不应该使用 QueryPerformanceCounter()
QueryPerformanceCounter()会给出相同的时间，它可以补偿 CPU 节流。如果您的 CPU 由于过热或其他原因而减速，RDTSC 将为您提供相同数量的周期。
因此，如果您的 CPU 开始过热并需要减速，QueryPerformanceCounter()则会说您的例程需要更多时间（这是误导性的），而 RDTSC 会说它需要相同数量的周期（这是准确的）。
这是您想要的，因为您对代码使用的 CPU 周期数量感兴趣，而不是挂钟时间。

来自最新的英特尔文档：http ://software.intel.com/en-us/articles/measure-code-sections-using-the-enhanced-timer/?wapkw=%28rdtsc%29

使用处理器时钟

这个计时器非常准确。在具有 3GHz 处理器的系统上，此计时器可以测量持续时间少于 1 纳秒的事件。[...] 如果在目标代码运行时频率发生变化，则最终读数将是多余的，因为初始读数和最终读数不是使用相同的时钟频率获取的。在此期间发生的时钟滴答数将是准确的，但经过的时间将是未知的。

何时不使用 RDTSC
RDTSC 对于基本时序很有用。如果您在单 CPU 机器上为多线程代码计时，RDTSC 可以正常工作。如果您有多个 CPU，则 startcount 可能来自一个 CPU，而 endcount 可能来自另一个。
所以不要使用 RDTSC 在多 CPU 机器上对多线程代码进行计时。在单 CPU 机器上它可以正常工作，或者在多 CPU 机器上的单线程代码也很好。
还要记住，RDTSC 计算 CPU 周期。如果有一些需要时间但不使用 CPU 的东西，比如磁盘 IO 或网络，那么 RDTSC 就不是一个好工具。

但是文档说 RDTSC 在现代 CPU 上并不准确
RDTSC不是跟踪时间的工具，而是跟踪 CPU 周期的工具。
为此，它是唯一准确的工具。跟踪时间的例程在现代 CPU 上并不准确，因为 CPU 时钟不像以前那样是绝对的。

score 10 · Accepted Answer

您没有指定您的 Delphi 版本，但 Delphi XE 在单元诊断中声明了一个 TStopWatch。这将允许您以合理的精度测量运行时间。

uses
  Diagnostics;
var
  sw: TStopWatch;
begin
  sw := TStopWatch.StartNew;
  <dosomething>
  Writeln(Format('runtime: %d ms', [sw.ElapsedMilliseconds]));
end;

score 7 · Accepted Answer

我问是因为我目前正在尝试优化一些功能

很自然地认为，衡量是您找出要优化什么的方式，但有更好的方法。

如果某件事需要花费足够大的时间 (F) 以值得优化，那么如果你只是随机暂停它，F 就是你在行动中抓住它的概率。多次这样做，您将准确地看到它为什么这样做，直到确切的代码行。

更多关于那个。这是一个例子。

修复它，然后进行整体测量以查看您节省了多少，这应该是 F. 冲洗并重复。

score 1 · Accepted Answer

以下是我为检查函数持续时间所做的一些程序。我将它们放在我调用的单元中uTesting，然后在测试期间将它们放入使用子句中。

宣言

  Procedure TST_StartTiming(Index : Integer = 1);
    //Starts the timer by storing now in Time
    //Index is the index of the timer to use. 100 are available

  Procedure TST_StopTiming(Index : Integer = 1;Display : Boolean = True; DisplaySM : Boolean = False);
    //Stops the timer and stores the difference between time and now into time
    //Displays the result if Display is true
    //Index is the index of the timer to use. 100 are available

  Procedure TST_ShowTime(Index : Integer = 1;Detail : Boolean = True; DisplaySM : Boolean = False);
    //In a ShowMessage displays time
    //Uses DateTimeToStr if Detail is false else it breaks it down (H,M,S,MS)
    //Index is the index of the timer to use. 100 are available

声明的变量

var
  Time : array[1..100] of TDateTime;

执行

  Procedure TST_StartTiming(Index : Integer = 1);
  begin
    Time[Index] := Now;
  end; 

  Procedure TST_StopTiming(Index : Integer = 1;Display : Boolean = True; DisplaySM : Boolean = False);
  begin
    Time[Index] := Now - Time[Index];
    if Display then TST_ShowTime;
  end;

  Procedure TST_ShowTime(Index : Integer = 1;Detail : Boolean = True; DisplaySM : Boolean = False);
  var
    H,M,S,MS : Word;
  begin
    if Detail then
      begin
        DecodeTime(Time[Index],H,M,S,MS);
        if DisplaySM then
        ShowMessage('Hour   =   ' + FloatToStr(H)  + #13#10 +
                    'Min     =   ' + FloatToStr(M)  + #13#10 +
                    'Sec      =   ' + FloatToStr(S)  + #13#10 +
                    'MS      =   ' + FloatToStr(MS) + #13#10)
        else
        OutputDebugString(PChar('Hour   =   ' + FloatToStr(H)  + #13#10 +
                    'Min     =   ' + FloatToStr(M)  + #13#10 +
                    'Sec      =   ' + FloatToStr(S)  + #13#10 +
                    'MS      =   ' + FloatToStr(MS) + #13#10));
      end
    else
      ShowMessage(TimeToStr(Time[Index]));
      OutputDebugString(Pchar(TimeToStr(Time[Index])));
  end;

score 0 · Accepted Answer

使用这个http://delphi.about.com/od/windowsshellapi/a/delphi-high-performance-timer-tstopwatch.htm

score 0 · Accepted Answer

clock_gettime()是高解，精确到纳秒，你也可以使用rtdsc精确到CPU周期的，最后你可以简单地使用gettimeofday()。

performance - 计算例程的速度？

8 回答 8

Related

Reference