c++ - C++ 程序在管道时执行得更好

Question

我已经十年没有做过任何编程了。我想重新开始，所以我做了这个毫无意义的小程序作为练习。描述它的作用的最简单方法是使用我的 --help 代码块的输出：

./prng_bench --help

./prng_bench: usage: ./prng_bench $N $B [$T]

   This program will generate an N digit base(B) random number until
all N digits are the same. 

Once a repeating N digit base(B) number is found, the following statistics are displayed:
  -Decimal value of all N digits.
  -Time & number of tries taken to randomly find.

Optionally, this process is repeated T times. 
   When running multiple repititions, averages for all N digit base(B)
numbers are displayed at the end, as well as total time and total tries.

我的“问题”是，当问题“简单”时，比如说一个 3 位以 10 为基数的数字，并且我让它进行了大量的传递，当管道传输到 grep 时，“总时间”会更少。IE：

命令 ; 命令 |grep 占用：

./prng_bench 3 10 999999 ; ./prng_bench 3 10 999999|grep took

....
Pass# 999999: All 3 base(10) digits =  3 base(10).   Time:    0.00005 secs.   Tries: 23
It took 191.86701 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
An average of 0.00019 secs & 99 tries was needed to find each one. 

It took 159.32355 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.

如果我多次运行相同的命令而没有 grep 时间总是非常接近。我现在正在使用 srand(1234) 进行测试。我对clock_gettime() 的启动和停止调用之间的代码不涉及任何流操作，这显然会影响时间。我意识到这是徒劳的练习，但我想知道为什么它会这样。下面是程序的核心。如果有人想编译和测试，这里是 DB 上完整源代码的链接。https://www.dropbox.com/s/bczggar2pqzp9g1/prng_bench.cpp clock_gettime() 需要 -lrt。

for (int pass_num=1; pass_num<=passes; pass_num++) {   //Executes $passes # of times.
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &temp_time);  //get time
  start_time = timetodouble(temp_time);                //convert time to double, store as start_time
  for(i=1, tries=0; i!=0; tries++) {    //loops until 'comparison for' fully completes. counts reps as 'tries'.  <------------
    for (i=0; i<Ndigits; i++)      //Move forward through array.                                                              |
      results[i]=(rand()%base);    //assign random num of base to element (digit).                                            |
    /*for (i=0; i<Ndigits; i++)     //---Debug Lines---------------                                                           |
      std::cout<<" "<<results[i];   //---a LOT of output.----------                                                           |
    std::cout << "\n";              //---Comment/decoment to disable/enable.*/   //                                           |
    for (i=Ndigits-1; i>0 && results[i]==results[0]; i--); //Move through array, != element breaks & i!=0, new digits drawn. -|
  }                                                        //If all are equal i will be 0, nested for condition satisfied.  -|
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &temp_time);  //get time
  draw_time = (timetodouble(temp_time) - start_time);  //convert time to dbl, subtract start_time, set draw_time to diff.
  total_time += draw_time;    //add time for this pass to total.
  total_tries += tries;       //add tries for this pass to total.
  /*Formated output for each pass:
    Pass# ---: All -- base(--) digits = -- base(10)   Time:   ----.---- secs.    Tries: ----- (LINE) */
  std::cout<<"Pass# "<<std::setw(width_pass)<<pass_num<<": All "<<Ndigits<<" base("<<base<<") digits = "
           <<std::setw(width_base)<<results[0]<<" base(10).   Time: "<<std::setw(width_time)<<draw_time
           <<" secs.   Tries: "<<tries<<"\n";
}
if(passes==1) return 0;        //No need for totals and averages of 1 pass.
/* It took ----.---- secs & ------ tries to find --- repeating -- digit base(--) numbers. (LINE)
 An average of ---.---- secs & ---- tries was needed to find each one. (LINE)(LINE) */
 std::cout<<"It took "<<total_time<<" secs & "<<total_tries<<" tries to find "
          <<passes<<" repeating "<<Ndigits<<" digit base("<<base<<") numbers.\n"
          <<"An average of "<<total_time/passes<<" secs & "<<total_tries/passes
          <<" tries was needed to find each one. \n\n";
return 0;

score 5 · Accepted Answer

与管道或不打印运行相比，打印到屏幕非常慢。管道到 grep 使您无法执行此操作。

score 2 · Accepted Answer

这与打印到屏幕无关；它是关于输出是终端（tty）。

根据POSIX 规范：

打开时，标准错误流没有完全缓冲；当且仅当可以确定流不引用交互式设备时，标准输入和标准输出流才会被完全缓冲。

Linux 将其解释为在输出为 tty（例如您的终端窗口）时使FILE *（即 stdio）stdout行缓冲，否则为块缓冲（例如您的管道）。

之所以sync_with_stdio有所不同，是因为启用它时，C++cout流会继承此行为。当您将其设置为时false，它不再受该行为的约束，因此成为块缓冲。

块缓冲更快，因为它避免了在每个换行符上刷新缓冲区的开销。

您可以通过管道来进一步验证这一点，cat而不是grep. 区别在于管道本身，而不是屏幕本身。

score 0 · Accepted Answer

谢谢科林和尼莫。我确信因为我没有在开始和停止时间之间调用 std::cout ，所以它不会产生影响。不是这样。我认为这是由于编译器即使使用 -O0 或“默认值”也会执行的优化。

我认为正在发生什么……？我认为正如 Collin 所建议的那样，编译器在写入 TTY 时试图变得聪明。而且，正如 Nemo 所指出的，cout 继承了 stdio 的行缓冲属性。

我可以通过使用以下方法来减少影响，但不能消除：

std::cout.sync_with_stdio(false);

根据我对此的有限阅读，应该在完成任何输出操作之前调用它。这是 no_sync 版本的来源：https ://www.dropbox.com/s/wugo7hxvu9ao8i3/prng_bench_no_sync.cpp

./no_sync 3 10 999999;./no_sync 3 10 999999|grep

用 -O0 编译

999999: All 3 base(10) digits =  3 base(10)  Time:    0.00004 secs.  Tries: 23
It took 166.30801 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
An average of 0.00017 secs & 99 tries was needed to find each one. 

It took 163.72914 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.

符合-O3

999999: All 3 base(10) digits =  3 base(10)  Time:    0.00003 secs.  Tries: 23
It took 143.23234 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
An average of 0.00014 secs & 99 tries was needed to find each one. 

It took 140.36195 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.

指定不与 stdio 同步将管道和非管道之间的增量从超过 30 秒更改为小于 3。请参阅原始增量的原始问题，它是 ~191 - ~160

为了进一步测试，我创建了另一个版本，使用一个结构来存储每次传递的统计信息。此方法在所有传递完成后执行所有输出。我想强调一下，这可能是一个糟糕的想法。我允许命令行参数确定动态分配的结构数组的大小，该数组包含 int、double 和 unsigned long。我什至不能用 999,999 次通行证运行这个版本。我得到一个分段错误。https://www.dropbox.com/s/785ntsm622q9mwd/prng_bench_struct.cpp

./struct_prng 3 10 99999;./struct_prng 3 10 99999|grep

Pass# 99999: All 3 base(10) digits =  6 base(10)  Time:    0.00025 secs.  Tries: 193
It took 13.10071 secs & 9970298 tries to find 99999 repeating 3 digit base(10) numbers.
An average of 0.00013 secs & 99 tries was needed to find each one. 

It took 13.12466 secs & 9970298 tries to find 99999 repeating 3 digit base(10) numbers.

我从中学到的是，你不能指望你编码事物的顺序是它们执行的顺序。在未来的程序中，我可能会实现 getopt 而不是编写我自己的 parse_args 函数。这将允许我通过要求用户在想要查看时使用 -v 开关来抑制高重复循环上的无关输出。

我希望进一步的测试对任何想知道管道和循环输出的人有用。我发布的所有结果都是在 RasPi 上获得的。所有链接的源代码都是 GPL，只是因为这是我能想到的第一个许可证......我真的不需要 GPL 的 copyleft 条款来自我夸大，我只是想清楚它是免费的，但没有保证或责任。

请注意，所有链接的源都已注释掉对 srand(...) 的调用，因此您的所有伪随机结果都将完全相同。

c++ - C++ 程序在管道时执行得更好

3 回答 3

Related

Reference