python - Why is printing to stdout so slow? Can it be sped up?

Question

I've always been amazed/frustrated with how long it takes to simply output to the terminal with a print statement. After some recent painfully slow logging I decided to look into it and was quite surprised to find that almost all the time spent is waiting for the terminal to process the results.

Can writing to stdout be sped up somehow?

I wrote a script ('print_timer.py' at the bottom of this question) to compare timing when writing 100k lines to stdout, to file, and with stdout redirected to /dev/null. Here is the timing result:

$ python print_timer.py
this is a test
this is a test
<snipped 99997 lines>
this is a test
-----
timing summary (100k lines each)
-----
print                         :11.950 s
write to file (+ fsync)       : 0.122 s
print with stdout = /dev/null : 0.050 s

Wow. To make sure python isn't doing something behind the scenes like recognizing that I reassigned stdout to /dev/null or something, I did the redirection outside the script...

$ python print_timer.py > /dev/null
-----
timing summary (100k lines each)
-----
print                         : 0.053 s
write to file (+fsync)        : 0.108 s
print with stdout = /dev/null : 0.045 s

So it isn't a python trick, it is just the terminal. I always knew dumping output to /dev/null sped things up, but never figured it was that significant!

It amazes me how slow the tty is. How can it be that writing to physical disk is WAY faster than writing to the "screen" (presumably an all-RAM op), and is effectively as fast as simply dumping to the garbage with /dev/null?

This link talks about how the terminal will block I/O so it can "parse [the input], update its frame buffer, communicate with the X server in order to scroll the window and so on"... but I don't fully get it. What can be taking so long?

I expect there is no way out (short of a faster tty implementation?) but figure I'd ask anyway.

UPDATE: after reading some comments I wondered how much impact my screen size actually has on the print time, and it does have some significance. The really slow numbers above are with my Gnome terminal blown up to 1920x1200. If I reduce it very small I get...

-----
timing summary (100k lines each)
-----
print                         : 2.920 s
write to file (+fsync)        : 0.121 s
print with stdout = /dev/null : 0.048 s

That is certainly better (~4x), but doesn't change my question. It only adds to my question as I don't understand why the terminal screen rendering should slow down an application writing to stdout. Why does my program need to wait for screen rendering to continue?

Are all terminal/tty apps not created equal? I have yet to experiment. It really seems to me like a terminal should be able to buffer all incoming data, parse/render it invisibly, and only render the most recent chunk that is visible in the current screen configuration at a sensible frame rate. So if I can write+fsync to disk in ~0.1 seconds, a terminal should be able to complete the same operation in something of that order (with maybe a few screen updates while it did it).

I'm still kind of hoping there is a tty setting that can be changed from the application side to make this behaviour better for programmer. If this is strictly a terminal application issue, then this maybe doesn't even belong on StackOverflow?

What am I missing?

Here is the python program used to generate the timing:

import time, sys, tty
import os

lineCount = 100000
line = "this is a test"
summary = ""

cmd = "print"
startTime_s = time.time()
for x in range(lineCount):
    print line
t = time.time() - startTime_s
summary += "%-30s:%6.3f s\n" % (cmd, t)

#Add a newline to match line outputs above...
line += "\n"

cmd = "write to file (+fsync)"
fp = file("out.txt", "w")
startTime_s = time.time()
for x in range(lineCount):
    fp.write(line)
os.fsync(fp.fileno())
t = time.time() - startTime_s
summary += "%-30s:%6.3f s\n" % (cmd, t)

cmd = "print with stdout = /dev/null"
sys.stdout = file(os.devnull, "w")
startTime_s = time.time()
for x in range(lineCount):
    fp.write(line)
t = time.time() - startTime_s
summary += "%-30s:%6.3f s\n" % (cmd, t)

print >> sys.stderr, "-----"
print >> sys.stderr, "timing summary (100k lines each)"
print >> sys.stderr, "-----"
print >> sys.stderr, summary

score 179 · Accepted Answer

写入物理磁盘怎么会比写入“屏幕”（可能是全 RAM 操作）快得多，并且实际上与使用 /dev/null 转储到垃圾中一样快？

恭喜，您刚刚发现了 I/O 缓冲的重要性。:-)

磁盘似乎更快，因为它是高度缓冲的：所有 Pythonwrite()调用都在实际写入物理磁盘之前返回。（操作系统稍后会这样做，将数千个单独的写入组合成一个大而有效的块。）

另一方面，终端很少或没有缓冲：每个人print/write(line)等待完整的写入（即显示到输出设备）完成。

为了使比较公平，您必须使文件 test 使用与终端相同的输出缓冲，您可以通过将示例修改为：

fp = file("out.txt", "w", 1)   # line-buffered, like stdout
[...]
for x in range(lineCount):
    fp.write(line)
    os.fsync(fp.fileno())      # wait for the write to actually complete

我在我的机器上运行了你的文件写入测试，并且通过缓冲，它在这里也是 0.05 秒 100,000 行。

但是，通过上述对无缓冲写入的修改，仅将 1,000 行写入磁盘需要 40 秒。我放弃了等待 100,000 行来写，但从前面推断，它需要一个多小时。

这让终端机的 11 秒变得清晰起来，不是吗？

因此，要回答您最初的问题，考虑到所有因素，写入终端实际上非常快，并且没有太多空间可以让它更快（但各个终端的工作量确实有所不同；请参阅 Russ 对此的评论回答）。

（您可以添加更多的写入缓冲，例如使用磁盘 I/O，但是在刷新缓冲区之前您不会看到写入终端的内容。这是一种权衡：交互性与批量效率。）

score 107 · Accepted Answer

感谢所有的评论！在您的帮助下，我最终自己回答了这个问题。不过，回答你自己的问题感觉很脏。

问题 1：为什么打印到标准输出很慢？

回答：打印到标准输出本身并不慢。您使用的终端很慢。它与应用程序端的 I/O 缓冲几乎为零（例如：python 文件缓冲）。见下文。

问题2：可以加速吗？

答：可以，但似乎不是从程序方面（对标准输出进行“打印”的一方）。要加快速度，请使用更快的不同终端仿真器。

解释...

我尝试了一个自我描述的“轻量级”终端程序wterm，并得到了明显更好的结果。以下是我的测试脚本的输出（在问题的底部），wterm在同一系统上以 1920x1200 运行时，基本打印选项使用 gnome-terminal 需要 12 秒：

-----
时序摘要（每个 100k 行）
-----
打印：0.261 秒
写入文件 (+fsync) : 0.110 s
使用 stdout = /dev/null 打印：0.050 秒

0.26s 比 12s 好很多！我不知道wterm它是如何按照我的建议（以合理的帧速率渲染“可见”尾巴）更智能地渲染到屏幕上，还是只是“做得更少” gnome-terminal。不过，就我的问题而言，我已经得到了答案。 gnome-terminal是缓慢的。

所以 - 如果你有一个长时间运行的脚本，你觉得它很慢并且它会向标准输出发送大量文本......尝试不同的终端，看看它是否更好！

请注意，我几乎是wterm从 ubuntu/debian 存储库中随机抽取的。这个链接可能是同一个终端，但我不确定。我没有测试任何其他终端仿真器。

更新：因为我不得不抓痒，我用相同的脚本和全屏 (1920x1200) 测试了一大堆其他终端模拟器。我手动收集的统计数据在这里：

wterm 0.3s
aterm 0.3s
rxvt 0.3s
mrxvt 0.4s
控制台0.6s
yakuake 0.7s
lxterminal 7s
xterm 9s
gnome 终端 12s
xfce4-终端 12s
vala-terminal 18s
xvt 48s

记录的时间是手动收集的，但它们非常一致。我记录了最好的（ish）价值。YMMV，显然。

作为奖励，这是对一些可用的各种终端仿真器的有趣之旅！我很惊讶我的第一个“替代”测试竟然是最好的。

score 14 · Accepted Answer

您的重定向可能没有任何作用，因为程序可以确定它们的输出 FD 是否指向 tty。

指向终端时，stdout 很可能是行缓冲的（与 C 的流行为相同stdout）。

作为一个有趣的实验，尝试将输出通过管道传输到cat.

我已经尝试过自己有趣的实验，结果如下。

$ python test.py 2>foo
...
$ cat foo
-----
timing summary (100k lines each)
-----
print                         : 6.040 s
write to file                 : 0.122 s
print with stdout = /dev/null : 0.121 s

$ python test.py 2>foo |cat
...
$ cat foo
-----
timing summary (100k lines each)
-----
print                         : 1.024 s
write to file                 : 0.131 s
print with stdout = /dev/null : 0.122 s

score 4 · Accepted Answer

我不能谈论技术细节，因为我不了解它们，但这并不让我感到惊讶：终端不是为打印大量这样的数据而设计的。实际上，您甚至可以提供一个指向大量 GUI 内容的链接，每次您想要打印某些东西时它都必须这样做！请注意，如果您pythonw改为调用脚本，则不需要 15 秒；这完全是一个 GUI 问题。重定向stdout到文件以避免这种情况：

import contextlib, io
@contextlib.contextmanager
def redirect_stdout(stream):
    import sys
    sys.stdout = stream
    yield
    sys.stdout = sys.__stdout__

output = io.StringIO
with redirect_stdout(output):
    ...

score 3 · Accepted Answer

打印到终端会很慢。不幸的是，没有编写一个新的终端实现，我真的看不出你会如何显着加快速度。

score 2 · Accepted Answer

除了输出可能默认为行缓冲模式外，输出到终端还会导致您的数据流入具有最大吞吐量的终端和串行线路，或伪终端和处理显示的单独进程事件循环，从某种字体渲染字符，移动显示位以实现滚动显示。后一种情况可能分布在多个进程上（例如 telnet 服务器/客户端、终端应用程序、X11 显示服务器），因此也存在上下文切换和延迟问题。

python - Why is printing to stdout so slow? Can it be sped up?

6 回答 6

Related

Reference