c++ - C++ 时钟保持为零

Question

我正在尝试获取我的程序的经过时间。实际上我认为我应该使用yclock()from time.h。但它在程序的所有阶段都保持为零，尽管我添加了 10^5 个数字（必须消耗一些 CPU 时间）。我已经搜索过这个问题，似乎运行 Linux 的人只有这个问题。我正在运行 Ubuntu 12.04LTS。

我将比较 AVX 和 SSE 指令，所以使用time_t并不是一个真正的选择。有什么提示吗？

这是代码：

 //Dimension of Arrays
unsigned int N = 100000;
//Fill two arrays with random numbers
unsigned  int a[N];
clock_t start_of_programm = clock();
for(int i=0;i<N;i++){
    a[i] = i;
}
clock_t after_init_of_a = clock();
unsigned  int b[N];
for(int i=0;i<N;i++){
    b[i] = i;
}
clock_t after_init_of_b = clock();

//Add the two arrays with Standard
unsigned int out[N];
for(int i = 0; i < N; ++i)
    out[i] = a[i] + b[i];
clock_t after_add = clock();

cout  << "start_of_programm " << start_of_programm  << endl; // prints
cout  << "after_init_of_a " << after_init_of_a  << endl; // prints
cout  << "after_init_of_b " << after_init_of_b  << endl; // prints
cout  << "after_add " << after_add  << endl; // prints
cout  << endl << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << endl;

以及控制台的输出。我也用printf()with %d，没有区别。

start_of_programm 0
after_init_of_a 0
after_init_of_b 0
after_add 0

CLOCKS_PER_SEC 1000000

score 5 · Accepted Answer

clock确实返回了使用的 CPU 时间，但粒度在 10Hz 左右。因此，如果您的代码不超过 100 毫秒，您将得到零。除非它明显长于 100 毫秒，否则您将无法获得非常准确的值，因为您的误差范围将在 100 毫秒左右。

因此，增加 N 或使用不同的方法来测量时间将是您的选择。std::chrono很可能会产生更准确的时间（但它会测量“墙上时间”，而不是 CPU 时间）。

timespec t1, t2; 
clock_gettime(CLOCK_REALTIME, &t1); 
... do stuff ... 
clock_gettime(CLOCK_REALTIME, &t2); 
double t = timespec_diff(t2, t1);

double timespec_diff(timespec t2, timespec t1)
{
    double d1 = t1.tv_sec + t1.tv_nsec / 1000000000.0;
    double d2 = t2.tv_sec + t2.tv_nsec / 1000000000.0;

    return d2 - d1;
}

score 2 · Accepted Answer

获得时间的最简单方法是仅使用 OpenMP 中的存根函数。这将适用于 MSVC、GCC 和 ICC。使用 MSVC，您甚至不需要启用 OpenMP。如果您愿意，您可以使用 ICC 只链接存根-openmp-stubs。使用 GCC，您必须使用 -fopenmp.

#include <omp.h>

double dtime;
dtime = omp_get_wtime();
foo();
dtime = omp_get_wtime() - dtime;
printf("time %f\n", dtime);

score 1 · Accepted Answer

首先，编译器很可能会优化你的代码。检查编译器的优化选项。

由于后续代码不使用数组包含out[], a[], b[]，并且不会out[], a[], b[]输出任何值，因此编译器将优化代码块，如下所示，就像根本不执行一样：

for(int i=0;i<=N;i++){
    a[i] = i;
}

for(int i=0;i<=N;i++){
    b[i] = i;
}

for(int i = 0; i < N; ++i)
    out[i] = a[i] + b[i];

由于clock()函数返回CPU时间，上述代码优化后几乎不消耗时间。

还有一件事，将 N 设置为更大的值。100000 对于性能测试来说太小了，如今计算机在 100000 规模的 o(n) 代码上运行得非常快。

unsigned int N = 10000000;

score 0 · Accepted Answer

将此添加到代码的末尾

int sum = 0;
for(int i = 0; i<N; i++)
    sum += out[i];
cout << sum;

然后你会看到时代。

由于您不使用a[], b[], out[]它会忽略相应的 for 循环。这是因为编译器的优化。

此外，要查看它所用的确切时间debug mode而不是release，那么您将能够看到它所花费的时间。

c++ - C++ 时钟保持为零

4 回答 4

Related

Reference