2

我试图测量一段代码的时间,并注意到当我从编辑器 QtCreator 中运行程序时,与从 gnome 终端中启动的 bash shell 运行程序相比,时间快了大约 50ns。我正在使用 Ubuntu 20.04 作为操作系统。

一个重现我的问题的小程序:

#include <stdio.h>
#include <time.h>

struct timespec now() {
  struct timespec now;
  clock_gettime(CLOCK_MONOTONIC, &now);
  return now;
}

long interval_ns(struct timespec tick, struct timespec tock) {
  return (tock.tv_sec - tick.tv_sec) * 1000000000L
      + (tock.tv_nsec - tick.tv_nsec);
}

int main() {
    // sleep(1);
    for (size_t i = 0; i < 10; i++) {
        struct timespec tick = now();
        struct timespec tock = now();
        long elapsed = interval_ns(tick, tock);
        printf("It took %lu ns\n", elapsed);
    }
    return 0;
}

从 QtCreator 中运行时的输出

It took 84 ns
It took 20 ns
It took 20 ns
It took 21 ns
It took 21 ns
It took 21 ns
It took 22 ns
It took 21 ns
It took 20 ns
It took 21 ns

在终端内从我的 shell 运行时:

$ ./foo 
It took 407 ns
It took 136 ns
It took 74 ns
It took 73 ns
It took 77 ns
It took 79 ns
It took 74 ns
It took 81 ns
It took 74 ns
It took 78 ns

我尝试过但没有任何影响的事情

  • 让 QtCreator 在终端中启动程序
  • 使用 rdtsc 和 rdtscp 调用而不是 clock_gettime(运行时的相对差异相同)
  • 通过在终端下运行从终端清除环境env -i
  • 使用 sh 而不是 bash 启动程序

我已经验证在所有情况下都调用了相同的二进制文件。我已经验证在所有情况下程序的 nice 值都是 0。

问题

为什么从我的 shell 启动程序会有所不同?关于尝试什么的任何建议?

更新

  • 如果我在 main 的开头添加一个 sleep(1) 调用,QtCreator 和 gnome-terminal/bash 调用都会报告更长的执行时间。

  • 如果我在 main 的开头添加了一个 system("ps -H") 调用,但删除了前面提到的 sleep(1):两个调用都报告了较短的执行时间(~20 ns)。

4

1 回答 1

2

只需添加更多迭代以使 CPU 时间加速到最大时钟速度。 您的“慢”时间是 CPU 处于低功耗空闲时钟速度。

QtCreator 显然在您的程序运行之前使用了足够的 CPU 时间来实现这一点,否则您正在编译 + 运行并且编译过程用作热身。(与bash的 fork/execve 相比,重量更轻。)

请参阅绩效评估的惯用方式?有关在基准测试时进行热身运行的更多信息,以及为什么这个延迟循环在没有睡眠的几次迭代后开始运行得更快?

在我运行 Linux 的 i7-6700k (Skylake) 上,将循环迭代计数增加到 1000 足以使最终迭代以全时钟速度运行,即使在前几次迭代处理页面错误、预热 iTLB、uop 缓存、数据之后也是如此缓存等等。

$ ./a.out      
It took 244 ns
It took 150 ns
It took 73 ns
It took 76 ns
It took 75 ns
It took 71 ns
It took 72 ns
It took 72 ns
It took 69 ns
It took 75 ns
...
It took 74 ns
It took 68 ns
It took 69 ns
It took 72 ns
It took 72 ns        # 382 "slow" iterations in this test run (copy/paste into wc to check)
It took 15 ns
It took 15 ns
It took 15 ns
It took 15 ns
It took 16 ns
It took 16 ns
It took 15 ns
It took 15 ns
It took 15 ns
It took 15 ns
It took 14 ns
It took 16 ns
...

在我的系统上,energy_performance_preference 设置为balance_performance,因此硬件 P 状态调节器没有performance. 用来grep . /sys/devices/system/cpu/cpufreq/policy[0-9]*/energy_performance_preference检查,sudo用来改变它:

sudo sh -c 'for i in /sys/devices/system/cpu/cpufreq/policy[0-9]*/energy_performance_preference;do echo balance_performance > "$i";done'

不过,即使在其下运行它perf stat ./a.out也足以快速提升到最大时钟速度;它真的不需要太多。但是bash在你按下 return 之后的命令解析非常execve便宜,在它调用并到达main你的新进程之前没有做太多的 CPU 工作。

带行缓冲的printf输出是程序中大部分 CPU 时间的原因,顺便说一句。这就是为什么只需要很少的迭代就可以加快速度的原因。例如,如果你运行perf stat --all-user -r10 ./a.out,你会看到每秒用户空间核心时钟周期只有 0.4GHz,其余时间花在内核中的write系统调用中。

于 2020-08-03T20:06:41.627 回答