c++ - 性能测量：时间与滴答？

Question

在 1 个或 2 个内核上运行 2 线程程序，确保实现实时性能的最佳方法是什么？boost::timer 还是 RDTSC ？

我们从那段代码开始

boost::timer t;
p.f(frame);
max_time_per_frame = std!::max(max_time_per_frame, t.ellapsed());

... where p is an instance of Proc.

class Proc {
public:
    Proc() : _frame_counter(0) {}

    // that function must be call for each video frame and take less than 1/fps seconds 
    // 24 fps => 1/24 => < 0.04 seconds.
    void f(unsigned char * const frame) 
    {
        processFrame(frame); //that's the most important part

        //that part run every 240 frame and should not affect
        // the processFrame flow !
        if(_frame_counter % 240 == 0) 
        {
            do_something_more();
        }
        _frame_counter++;
    }

private:
    _frame_counter;
}

所以它以单线程/单核的方式运行，我们观察到 max_time_per_frame 由于do_something_more处理而高于目标时间。为了消除那些处理时间峰值，我们do_something_more在一个单独的线程中开始每个，就像下面的伪代码一样。

class Proc {
public:
    Proc() : _frame_counter(0) {
        t = start_thread ( do_something_more_thread );
    }

    // that function must be call for each video frame and take less than 1/fps seconds 
    // 24 fps => 1/24 => < 0.04 seconds.
    void f(unsigned char * const frame) 
    {
        processFrame(frame); //that's the most important part

        //that part run every 240 frame and should not affect
        // the processFrame flow !
        if(_frame_counter % 240 == 0) 
        {
            sem.up();
        }
        _frame_counter++;
    }

    void do_something_more_thread()
    {
       while(1)
       {
            sem.down();
            do_something_more();
       }
    }

private:
    _frame_counter;
    semaphore sem; 
    thread t;
}

我总是在 1 核和 2 核上启动我的程序。所以我使用start /AFFINITY 1 pro.exeorstart /AFFINITY 3 prog.exe 并且从时间的角度来看，一切都很好，max_time_per_frame保持在我们的目标之下，接近 0.02 秒/帧的平均值。

但是，如果我使用 RDTSC 转储 f 中花费的刻度数。

#include <intrin.h>
...
unsigned long long getTick()
{
    return __rdtsc();
}

void f(unsigned char * const frame) 
{
    s = getTick();

    processFrame(frame); //that's the most important part

    //that part run every 240 frame and should not affect
    // the processFrame flow !
    if(_frame_counter % 240 == 0) 
    {
        sem.up();
    }
    _frame_counter++;

    e = getTick();
    dump(e - s);
}

start /AFFINITY 3 prog.exemax_tick_per_frame 是稳定的，正如预期的那样，我看到了 1 个线程（1 个核心的 100%），并且第 2 个线程在第 2 个核心上以正常速度启动。

start /AFFINITY 1 pro.exe，我只看到 1 个核心 100%（如预期的那样），但do_something_more计算时间似乎并没有随着时间的推移而加快，交错线程执行。事实上，每隔一段时间，我就会看到滴答计数的巨大峰值。

所以问题是为什么？唯一有趣的衡量标准是time？在 1 个内核（频率提升）上运行软件时是否tick有意义？

score 1 · Accepted Answer

尽管您永远无法从 Windows 中获得真正的实时性能，但您可以通过使用Windows API来减少 RDTSC 的缺陷。

这是一个利用 API 的小代码块。

#include <Windows.h>
#include <stdio.h>

int
main(int argc, char* argv[])
{
    double timeTaken;
    LARGE_INTEGER frequency;
    LARGE_INTEGER firstCount;
    LARGE_INTEGER endCount;
    /*-- give us the higheest priority avaliable --*/
    SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL);
    /*-- get the frequency of the timer we are using --*/
    QueryPerformanceFrequency(&frequency);  
    /*-- get the timers current tick --*/
    QueryPerformanceCounter(&firstCount);
    /*-- some pause --*/
    Sleep(1);
    /*-- get the timers current tick --*/
    QueryPerformanceCounter(&endCount);
    /*-- calculate time passed --*/
    timeTaken = (double)(doubleendCount.QuadPart-firstCount.QuadPart)/(double)(frequency.QuadPart/1000);

    printf("Time: %lf", timeTaken);

    return 0;
}

您还可以使用：

#include <Mmsystem.h>
if(timeBeginPeriod(1) == TIMERR_NOCANDO) {
    printf("TIMER could not be set to 1ms\n");
}
/*-- your code here --*/
timeEndPeriod(1);

但这会将全局 Windows 计时器分辨率更改为您设置的任何时间间隔（或至少尝试它），所以我不会推荐这种方法，除非您 100% 确定您是唯一使用此程序的人这可能会对其他程序产生意想不到的副作用。

score 0 · Accepted Answer

基于关于 REALTIME_PRIORITY_CLASS 的评论，我在测试程序中添加了以下行。

#define NOMINMAX
#include <windows.h>
....

SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);

现在我从 RDTSC 获得的滴答计数看起来更好了，我之前在 1 帧上看到的巨大峰值现在分布在多个帧上。

由于我想保持我的代码可移植性并创造一些调度机会，我在某个特定点使用以下方法产生了额外的线程：

boots::this_thread::yield();

通过该更改，我无需配置优先级即可获得预期的调度和 RDTSC 值！

感谢所有帮助和建议。

c++ - 性能测量：时间与滴答？

2 回答 2

Related

Reference