c++ - 为什么在while循环中有或没有sleep/usleep的情况下，花费在相同函数/方法调用上的时间差别很大？

Question

我原来的 c++ 演示代码如下所示：

int counter = 0;
while (counter < 5) {
    auto start = std::chrono::high_resolution_clock::now();
    // instance and result are pre-defined local variables
    instance.Search(40.055948, 116.411325, &result);
    auto end = std::chrono::high_resolution_clock::now();
    int64_t cost_us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
    std::cout << "cost_us=" << cost_us << std::endl;
    // usleep(100); // case1: sleep 100 us
    // sleep(1);    // case2: sleep 1 second
    //                 case3: no sleep at all
    counter++;
  }s

有些人可能会怀疑我对 instance.Search() 的调用会带来一些未知的东西，所以请参考下面的代码：

#include <time.h>
#include <unistd.h>
#include <iostream>
#include <set>
#include <chrono>

void test(const std::set<int>& numbers) {

  for (int counter = 0; counter < 5; ++counter) {
    auto start = std::chrono::high_resolution_clock::now();
    auto it = numbers.lower_bound(5555555);
    auto end = std::chrono::high_resolution_clock::now();
    int64_t cost_us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
    std::cout << "cost_us=" << cost_us << std::endl;
    // usleep(100);
    // sleep(1);
  }
}

int main() {
  std::set<int> test_set;
  for (int i = 0; i < 100000000; i++) {
    test_set.insert(i);
  }
  test(test_set);
}

前置条件：用于perf stat统计cache_miss、指令等...

案例1：在while循环末尾添加usleep(100)时，处理完成后的结果是：

cost_us=5
cost_us=5
cost_us=5
cost_us=8
cost_us=6
Performance counter stats for './latency_perf_test_sleep_100_us':
    1,785,438     cache-references
      419,583     cache-misses        #   23.500 % of all cache refs
  203,832,235     cycles
  118,093,490     instructions        #   0.58 insn per cycle
   23,198,708     branches
       35,092     faults
          302     migrations

  1.031460583 seconds time elapsed

案例2：添加sleep(1)时，结果为：

cost_us=7
cost_us=65
cost_us=21
cost_us=21
cost_us=32
Performance counter stats for './latency_perf_test_sleep_1_sec':
   15,093,302     cache-references
    1,303,941     cache-misses          #   8.639 % of all cache refs
14,759,103,041    cycles
24,548,401,788    instructions          #   1.66 insn per cycle
5,062,488,529     branches
       35,372     faults
        3,444     migrations
  6.033182248 seconds time elapsed

案例 3：根本没有 sleep()/usleep()，结果是：

cost_us=5
cost_us=2
cost_us=1
cost_us=1
cost_us=1
Performance counter stats for './latency_perf_test_without_sleep':
    1,715,128 cache-references
      420,368 cache-misses              #   24.509 % of all cache refs
  209,238,013 cycles
  130,647,626 instructions              #   0.62 insn per cycle
   25,827,456 branches
       35,092 faults
          362 migrations
          
  1.032256618 seconds time elapsed

如上所述，相同函数/方法调用的时间成本在不同情况下差异很大。起初，我倾向于认为这sleep()会导致缓存未命中（我的调用使用的数据）。但是，在我taskset将我的进程与特定的 cpu 核心绑定后，差异并没有像我预期的那样消失。

我也想知道为什么添加sleep()/usleep()会导致instructions计数的急剧增加perf stat。

我没有阅读sleep()or的任何源代码，但我猜想内核可能会在进程调用or （两者都在内部调用）usleep()时做一些伎俩。sleep()usleep()nanosleep()

谁能解释这种奇怪现象背后的原因？提前致谢。

score 0 · Accepted Answer

谁能解释这种奇怪现象背后的原因？

ldd --version ldd (GNU libc) 2.17 版权所有 (C) 2012

您sleep看起来像来自 2012 sysv/linux/sleep.c 的 glibc 来源。由于内核错误（或者它是预期的 SysV 行为？我不太确定......），它SIGCHLD在调用之前阻塞，请参阅这个提交并由它 LKML 线程引用。nanosleep

最有可能的额外开销来自调用__sig*相关函数。要进一步调查，请分析代码（考虑gprof）或/和glibc使用调试信息从源代码编译您的（或者如果可以的话，只安装调试信息）然后分析代码。

c++ - 为什么在while循环中有或没有sleep/usleep的情况下，花费在相同函数/方法调用上的时间差别很大？

1 回答 1

Related

Reference