c++ - 使用 C++ 以纳秒为单位提供时间的计时器功能

Question

我希望计算 API 返回值所需的时间。这种动作所花费的时间是纳秒级的。由于 API 是 C++ 类/函数，我使用 timer.h 来计算相同的值：

  #include <ctime>
  #include <iostream>

  using namespace std;

  int main(int argc, char** argv) {

      clock_t start;
      double diff;
      start = clock();
      diff = ( std::clock() - start ) / (double)CLOCKS_PER_SEC;
      cout<<"printf: "<< diff <<'\n';

      return 0;
  }

上面的代码以秒为单位给出了时间。如何在纳秒内更精确地获得相同的结果？

score 87 · Accepted Answer

其他人发布的关于在循环中重复运行该函数的内容是正确的。

对于 Linux（和 BSD），您想使用clock_gettime()。

#include <sys/time.h>

int main()
{
   timespec ts;
   // clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
   clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
}

对于要使用QueryPerformanceCounter的窗口。这里有更多关于QPC

显然，某些芯片组上的 QPC 存在已知问题，因此您可能需要确保您没有这些芯片组。此外，一些双核 AMD 也可能导致问题。请参阅 sebbbi 的第二篇文章，他在其中指出：

QueryPerformanceCounter() 和 QueryPerformanceFrequency() 提供了更好的分辨率，但有不同的问题。例如，在 Windows XP 中，所有 AMD Athlon X2 双核 CPU “随机”返回任一内核的 PC（PC 有时会向后跳一点），除非您专门安装 AMD 双核驱动程序包来解决问题。我们没有注意到任何其他双核 CPU 有类似问题（p4 dual、p4 ht、core2 dual、core2 quad、phenom quad）。

编辑 2013/07/16：

如http://msdn.microsoft.com/en-us/library/windows/desktop/ee417693(v=vs.85).aspx所述，在某些情况下，QPC 的功效似乎存在一些争议

...虽然 QueryPerformanceCounter 和 QueryPerformanceFrequency 通常针对多个处理器进行调整，但 BIOS 或驱动程序中的错误可能会导致这些例程在线程从一个处理器移动到另一个处理器时返回不同的值...

然而，这个 StackOverflow 答案https://stackoverflow.com/a/4588605/34329指出 QPC 应该在 Win XP Service Pack 2 之后的任何 MS 操作系统上正常工作。

本文显示 Windows 7 可以确定处理器是否具有不变的 TSC，如果没有，则回退到外部计时器。http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html跨处理器同步仍然是一个问题。

其他与计时器相关的精读：

有关更多详细信息，请参阅评论。

score 71 · Accepted Answer

这个新答案使用了 C++11 的<chrono>功能。虽然还有其他答案显示了如何使用<chrono>，但没有一个显示如何使用此处其他几个答案中提到<chrono>的设施。RDTSC所以我想我会展示如何使用RDTSCwith <chrono>。此外，我将演示如何模板化时钟上的测试代码，以便您可以在RDTSC系统的内置时钟设施（可能基于clock(),clock_gettime()和/或QueryPerformanceCounter.

请注意，该RDTSC指令是特定于 x86 的。 QueryPerformanceCounter仅适用于 Windows。并且clock_gettime()仅是 POSIX。下面我介绍两个新时钟：std::chrono::high_resolution_clock和std::chrono::system_clock，如果你可以假设 C++11，它们现在是跨平台的。

rdtsc首先，这是从 Intel汇编指令中创建与 C++11 兼容的时钟的方法。我会这样称呼它x::clock：

#include <chrono>

namespace x
{

struct clock
{
    typedef unsigned long long                 rep;
    typedef std::ratio<1, 2'800'000'000>       period; // My machine is 2.8 GHz
    typedef std::chrono::duration<rep, period> duration;
    typedef std::chrono::time_point<clock>     time_point;
    static const bool is_steady =              true;

    static time_point now() noexcept
    {
        unsigned lo, hi;
        asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
        return time_point(duration(static_cast<rep>(hi) << 32 | lo));
    }
};

}  // x

这个时钟所做的只是计算 CPU 周期并将其存储在一个无符号的 64 位整数中。您可能需要调整编译器的汇编语言语法。或者你的编译器可能会提供一个你可以使用的内在函数（例如now() {return __rdtsc();}）。

要构建时钟，您必须为其提供表示（存储类型）。您还必须提供时钟周期，它必须是编译时间常数，即使您的机器可能会在不同的功耗模式下改变时钟速度。从这些基础上，您可以轻松地定义时钟的“本机”持续时间和时间点。

如果您只想输出时钟滴答的数量，那么您为时钟周期提供的数字并不重要。仅当您要将时钟滴答数转换为某些实时单位（例如纳秒）时，此常数才会起作用。在这种情况下，您能够提供的时钟速度越准确，转换为纳秒（毫秒，无论如何）的精度就越高。

下面是显示如何使用的示例代码x::clock。实际上，我已经对时钟上的代码进行了模板化，因为我想展示如何以完全相同的语法使用许多不同的时钟。这个特定的测试显示了在循环下运行您想要计时的循环开销是多少：

#include <iostream>

template <class clock>
void
test_empty_loop()
{
    // Define real time units
    typedef std::chrono::duration<unsigned long long, std::pico> picoseconds;
    // or:
    // typedef std::chrono::nanoseconds nanoseconds;
    // Define double-based unit of clock tick
    typedef std::chrono::duration<double, typename clock::period> Cycle;
    using std::chrono::duration_cast;
    const int N = 100000000;
    // Do it
    auto t0 = clock::now();
    for (int j = 0; j < N; ++j)
        asm volatile("");
    auto t1 = clock::now();
    // Get the clock ticks per iteration
    auto ticks_per_iter = Cycle(t1-t0)/N;
    std::cout << ticks_per_iter.count() << " clock ticks per iteration\n";
    // Convert to real time units
    std::cout << duration_cast<picoseconds>(ticks_per_iter).count()
              << "ps per iteration\n";
}

这段代码所做的第一件事是创建一个“实时”单位来显示结果。我选择了皮秒，但您可以选择任何您喜欢的单位，无论是基于整数还是基于浮点。std::chrono::nanoseconds例如，有一个我可以使用的预制单元。

作为另一个示例，我想将每次迭代的平均时钟周期数打印为浮点数，因此我创建了另一个基于 double 的持续时间，其单位与时钟的刻度相同（Cycle在代码中调用）。

clock::now()循环与任一侧的调用一起计时。如果要命名从此函数返回的类型，它是：

typename clock::time_point t0 = clock::now();

（如示例中清楚显示的那样x::clock，系统提供的时钟也是如此）。

要根据浮点时钟滴答获得持续时间，只需减去两个时间点，并获得每次迭代值，将该持续时间除以迭代次数。

您可以使用count()成员函数获取任何持续时间的计数。这将返回内部表示。最后，我使用std::chrono::duration_cast将持续时间转换为持续Cycle时间picoseconds并将其打印出来。

使用此代码很简单：

int main()
{
    std::cout << "\nUsing rdtsc:\n";
    test_empty_loop<x::clock>();

    std::cout << "\nUsing std::chrono::high_resolution_clock:\n";
    test_empty_loop<std::chrono::high_resolution_clock>();

    std::cout << "\nUsing std::chrono::system_clock:\n";
    test_empty_loop<std::chrono::system_clock>();
}

上面我使用我们自制的进行测试x::clock，并将这些结果与使用系统提供的两个时钟进行比较： std::chrono::high_resolution_clock和std::chrono::system_clock。对我来说，这打印出来：

Using rdtsc:
1.72632 clock ticks per iteration
616ps per iteration

Using std::chrono::high_resolution_clock:
0.620105 clock ticks per iteration
620ps per iteration

Using std::chrono::system_clock:
0.00062457 clock ticks per iteration
624ps per iteration

这表明这些时钟中的每一个都有不同的滴答周期，因为每个时钟每次迭代的滴答声都大不相同。但是，当转换为已知的时间单位（例如皮秒）时，每个时钟我得到的结果大致相同（您的里程可能会有所不同）。

请注意我的代码是如何完全没有“魔法转换常数”的。事实上，整个例子中只有两个幻数：

我的机器的时钟速度以定义x::clock.
要测试的迭代次数。如果更改此数字会使您的结果变化很大，那么您可能应该增加迭代次数，或者在测试时清空计算机中的竞争进程。

score 29 · Accepted Answer

有了这种准确度，最好在 CPU 滴答声中进行推理，而不是在诸如 clock()之类的系统调用中进行推理。并且不要忘记，如果执行一条指令需要超过一纳秒的时间……拥有纳秒的精度几乎是不可能的。

不过，这样的事情是一个开始：

这是检索自上次启动 CPU 以来经过的 80x86 CPU 时钟滴答数的实际代码。它适用于 Pentium 及更高版本（不支持 386/486）。这段代码实际上是 MS Visual C++ 特定的，但只要它支持内联汇编，就可以很容易地移植到其他任何地方。

inline __int64 GetCpuClocks()
{

    // Counter
    struct { int32 low, high; } counter;

    // Use RDTSC instruction to get clocks count
    __asm push EAX
    __asm push EDX
    __asm __emit 0fh __asm __emit 031h // RDTSC
    __asm mov counter.low, EAX
    __asm mov counter.high, EDX
    __asm pop EDX
    __asm pop EAX

    // Return result
    return *(__int64 *)(&counter);

}

这个函数还有一个非常快的优点——它通常需要不超过 50 个 cpu 周期来执行。

使用时序图：
如果您需要将时钟计数转换为真实的经过时间，请将结果除以芯片的时钟速度。请记住，“额定”GHz 可能与芯片的实际速度略有不同。要检查您的芯片的真实速度，您可以使用几个非常好的实用程序或 Win32 调用，QueryPerformanceFrequency()。

score 23 · Accepted Answer

要正确执行此操作，您可以使用两种方法之一，使用RDTSC或使用clock_gettime()。第二个大约快 2 倍，并且具有提供正确绝对时间的优势。请注意，为了RDTSC正常工作，您需要按照指示使用它（此页面上的其他注释有错误，并且可能在某些处理器上产生不正确的时序值）

inline uint64_t rdtsc()
{
    uint32_t lo, hi;
    __asm__ __volatile__ (
      "xorl %%eax, %%eax\n"
      "cpuid\n"
      "rdtsc\n"
      : "=a" (lo), "=d" (hi)
      :
      : "%ebx", "%ecx" );
    return (uint64_t)hi << 32 | lo;
}

而对于clock_gettime：（我随意选择了微秒分辨率）

#include <time.h>
#include <sys/timeb.h>
// needs -lrt (real-time lib)
// 1970-01-01 epoch UTC time, 1 mcs resolution (divide by 1M to get time_t)
uint64_t ClockGetTime()
{
    timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);
    return (uint64_t)ts.tv_sec * 1000000LL + (uint64_t)ts.tv_nsec / 1000LL;
}

产生的时间和价值：

Absolute values:
rdtsc           = 4571567254267600
clock_gettime   = 1278605535506855

Processing time: (10000000 runs)
rdtsc           = 2292547353
clock_gettime   = 1031119636

score 22 · Accepted Answer

我正在使用以下内容来获得所需的结果：

#include <time.h>
#include <iostream>
using namespace std;

int main (int argc, char** argv)
{
    // reset the clock
    timespec tS;
    tS.tv_sec = 0;
    tS.tv_nsec = 0;
    clock_settime(CLOCK_PROCESS_CPUTIME_ID, &tS);
    ...
    ... <code to check for the time to be put here>
    ...
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &tS);
    cout << "Time taken is: " << tS.tv_sec << " " << tS.tv_nsec << endl;

    return 0;
}

score 8 · Accepted Answer

对于C++11，这是一个简单的包装器：

#include <iostream>
#include <chrono>

class Timer
{
public:
    Timer() : beg_(clock_::now()) {}
    void reset() { beg_ = clock_::now(); }
    double elapsed() const {
        return std::chrono::duration_cast<second_>
            (clock_::now() - beg_).count(); }

private:
    typedef std::chrono::high_resolution_clock clock_;
    typedef std::chrono::duration<double, std::ratio<1> > second_;
    std::chrono::time_point<clock_> beg_;
};

或者对于 *nix 上的 C++03，

class Timer
{
public:
    Timer() { clock_gettime(CLOCK_REALTIME, &beg_); }

    double elapsed() {
        clock_gettime(CLOCK_REALTIME, &end_);
        return end_.tv_sec - beg_.tv_sec +
            (end_.tv_nsec - beg_.tv_nsec) / 1000000000.;
    }

    void reset() { clock_gettime(CLOCK_REALTIME, &beg_); }

private:
    timespec beg_, end_;
};

使用示例：

int main()
{
    Timer tmr;
    double t = tmr.elapsed();
    std::cout << t << std::endl;

    tmr.reset();
    t = tmr.elapsed();
    std::cout << t << std::endl;
    return 0;
}

来自https://gist.github.com/gongzhitaao/7062087

score 5 · Accepted Answer

一般来说，为了计时调用一个函数需要多长时间，您希望执行多次而不是一次。如果你只调用你的函数一次并且它需要很短的时间来运行，你仍然有实际调用计时器函数的开销，你不知道需要多长时间。

例如，如果您估计您的函数可能需要 800 ns 才能运行，请在循环中调用它一千万次（然后大约需要 8 秒）。将总时间除以一千万得到每次调用的时间。

score 5 · Accepted Answer

在 x86 处理器下运行的 gcc 可以使用以下函数：

unsigned long long rdtsc()
{
  #define rdtsc(low, high) \
         __asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))

  unsigned int low, high;
  rdtsc(low, high);
  return ((ulonglong)high << 32) | low;
}

使用数字火星 C++：

unsigned long long rdtsc()
{
   _asm
   {
        rdtsc
   }
}

它读取芯片上的高性能定时器。我在进行分析时使用它。

score 3 · Accepted Answer

如果您需要亚秒级精度，则需要使用系统特定的扩展，并且必须查看操作系统的文档。POSIX 使用gettimeofday最多支持微秒，但由于计算机的频率不高于 1GHz，因此没有更精确的了。

如果您使用的是 Boost，则可以检查boost::posix_time。

score 3 · Accepted Answer

我在这里使用的是 Borland 代码，代码 ti_hund 有时会给我一个负数，但时机相当好。

#include <dos.h>

void main() 
{
struct  time t;
int Hour,Min,Sec,Hun;
gettime(&t);
Hour=t.ti_hour;
Min=t.ti_min;
Sec=t.ti_sec;
Hun=t.ti_hund;
printf("Start time is: %2d:%02d:%02d.%02d\n",
   t.ti_hour, t.ti_min, t.ti_sec, t.ti_hund);
....
your code to time
...

// read the time here remove Hours and min if the time is in sec

gettime(&t);
printf("\nTid Hour:%d Min:%d Sec:%d  Hundreds:%d\n",t.ti_hour-Hour,
                             t.ti_min-Min,t.ti_sec-Sec,t.ti_hund-Hun);
printf("\n\nAlt Ferdig Press a Key\n\n");
getch();
} // end main

score 3 · Accepted Answer

使用 Brock Adams 的方法，带有一个简单的类：

int get_cpu_ticks()
{
    LARGE_INTEGER ticks;
    QueryPerformanceFrequency(&ticks);
    return ticks.LowPart;
}

__int64 get_cpu_clocks()
{
    struct { int32 low, high; } counter;

    __asm cpuid
    __asm push EDX
    __asm rdtsc
    __asm mov counter.low, EAX
    __asm mov counter.high, EDX
    __asm pop EDX
    __asm pop EAX

    return *(__int64 *)(&counter);
}

class cbench
{
public:
    cbench(const char *desc_in) 
         : desc(strdup(desc_in)), start(get_cpu_clocks()) { }
    ~cbench()
    {
        printf("%s took: %.4f ms\n", desc, (float)(get_cpu_clocks()-start)/get_cpu_ticks());
        if(desc) free(desc);
    }
private:
    char *desc;
    __int64 start;
};

使用示例：

int main()
{
    {
        cbench c("test");
        ... code ...
    }
    return 0;
}

结果：

测试时间：0.0002 毫秒

有一些函数调用开销，但应该仍然足够快:)

score 3 · Accepted Answer

您可以使用Embedded Profiler（对 Windows 和 Linux 免费），它具有与多平台计时器的接口（以处理器周期计数），并且可以为您提供每秒的周期数：

EProfilerTimer timer;
timer.Start();

... // Your code here

const uint64_t number_of_elapsed_cycles = timer.Stop();
const uint64_t nano_seconds_elapsed =
    mumber_of_elapsed_cycles / (double) timer.GetCyclesPerSecond() * 1000000000;

对于 CPU 频率可以动态更改的现代处理器，将周期计数重新计算为时间可能是一项危险的操作。因此，为了确保转换的时间是正确的，有必要在分析之前修复处理器频率。

score 2 · Accepted Answer

如果这是针对 Linux 的，我一直在使用函数“gettimeofday”，它返回一个结构，该结构给出自 Epoch 以来的秒数和微秒数。然后，您可以使用 timersub 将两者相减以获得时间差，并将其转换为您想要的任何时间精度。但是，您指定纳秒，并且看起来函数clock_gettime()是您正在寻找的。它将以秒和纳秒为单位的时间放入您传递给它的结构中。

score 2 · Accepted Answer

你怎么看：

    int iceu_system_GetTimeNow(long long int *res)
    {
      static struct timespec buffer;
      // 
    #ifdef __CYGWIN__
      if (clock_gettime(CLOCK_REALTIME, &buffer))
        return 1;
    #else
      if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &buffer))
        return 1;
    #endif
      *res=(long long int)buffer.tv_sec * 1000000000LL + (long long int)buffer.tv_nsec;
      return 0;
    }

score 2 · Accepted Answer

这是一个很好的Boost计时器，效果很好：

//Stopwatch.hpp

#ifndef STOPWATCH_HPP
#define STOPWATCH_HPP

//Boost
#include <boost/chrono.hpp>
//Std
#include <cstdint>

class Stopwatch
{
public:
    Stopwatch();
    virtual         ~Stopwatch();
    void            Restart();
    std::uint64_t   Get_elapsed_ns();
    std::uint64_t   Get_elapsed_us();
    std::uint64_t   Get_elapsed_ms();
    std::uint64_t   Get_elapsed_s();
private:
    boost::chrono::high_resolution_clock::time_point _start_time;
};

#endif // STOPWATCH_HPP


//Stopwatch.cpp

#include "Stopwatch.hpp"

Stopwatch::Stopwatch():
    _start_time(boost::chrono::high_resolution_clock::now()) {}

Stopwatch::~Stopwatch() {}

void Stopwatch::Restart()
{
    _start_time = boost::chrono::high_resolution_clock::now();
}

std::uint64_t Stopwatch::Get_elapsed_ns()
{
    boost::chrono::nanoseconds nano_s = boost::chrono::duration_cast<boost::chrono::nanoseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
    return static_cast<std::uint64_t>(nano_s.count());
}

std::uint64_t Stopwatch::Get_elapsed_us()
{
    boost::chrono::microseconds micro_s = boost::chrono::duration_cast<boost::chrono::microseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
    return static_cast<std::uint64_t>(micro_s.count());
}

std::uint64_t Stopwatch::Get_elapsed_ms()
{
    boost::chrono::milliseconds milli_s = boost::chrono::duration_cast<boost::chrono::milliseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
    return static_cast<std::uint64_t>(milli_s.count());
}

std::uint64_t Stopwatch::Get_elapsed_s()
{
    boost::chrono::seconds sec = boost::chrono::duration_cast<boost::chrono::seconds>(boost::chrono::high_resolution_clock::now() - _start_time);
    return static_cast<std::uint64_t>(sec.count());
}

score 2 · Accepted Answer

简约的复制粘贴结构 + 懒惰的使用

如果想法是拥有一个可用于快速测试的简约结构，那么我建议您只需复制并粘贴到 C++ 文件中#include's. 这是我牺牲 Allman 样式格式的唯一实例。

您可以轻松调整结构第一行的精度。可能的值为：nanoseconds、microseconds、milliseconds、seconds、minutes或hours。

#include <chrono>
struct MeasureTime
{
    using precision = std::chrono::microseconds;
    std::vector<std::chrono::steady_clock::time_point> times;
    std::chrono::steady_clock::time_point oneLast;
    void p() {
        std::cout << "Mark " 
                << times.size()/2
                << ": " 
                << std::chrono::duration_cast<precision>(times.back() - oneLast).count() 
                << std::endl;
    }
    void m() {
        oneLast = times.back();
        times.push_back(std::chrono::steady_clock::now());
    }
    void t() {
        m();
        p();
        m();
    }
    MeasureTime() {
        times.push_back(std::chrono::steady_clock::now());
    }
};

用法

MeasureTime m; // first time is already in memory
doFnc1();
m.t(); // Mark 1: next time, and print difference with previous mark
doFnc2();
m.t(); // Mark 2: next time, and print difference with previous mark
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.t(); // prints 'Mark 3: 123123' etc...

标准输出结果

Mark 1: 123
Mark 2: 32
Mark 3: 433234

如果您想要执行后的摘要

如果您之后想要报告，因为例如您之间的代码也会写入标准输出。然后将以下函数添加到结构中（就在 MeasureTime() 之前）：

void s() { // summary
    int i = 0;
    std::chrono::steady_clock::time_point tprev;
    for(auto tcur : times)
    {
        if(i > 0)
        {
            std::cout << "Mark " << i << ": "
                    << std::chrono::duration_cast<precision>(tprev - tcur).count()
                    << std::endl;
        }
        tprev = tcur;
        ++i;
    }
}

那么你可以使用：

MeasureTime m;
doFnc1();
m.m();
doFnc2();
m.m();
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.m();
m.s();

它将像以前一样列出所有标记，但是在执行其他代码之后。请注意，您不应同时使用m.s()和m.t()。

score 0 · Accepted Answer

plf::nanotimer是一个轻量级选项，适用于 Windows、Linux、Mac 和 BSD 等。根据操作系统，精度约为微秒：

  #include "plf_nanotimer.h"
  #include <iostream>

  int main(int argc, char** argv)
  {
      plf::nanotimer timer;

      timer.start()

      // Do something here

      double results = timer.get_elapsed_ns();
      std::cout << "Timing: " << results << " nanoseconds." << std::endl;    
      return 0;
  }

c++ - 使用 C++ 以纳秒为单位提供时间的计时器功能

17 回答 17

简约的复制粘贴结构 + 懒惰的使用

用法

标准输出结果

如果您想要执行后的摘要

Related

Reference