python - python的subprocess.Popen跳过输入

Question

我发现 subprocess.Popen() 在特定情况下会跳过输入字节。为了演示这个问题，我编写了以下（无意义的）程序：

import sys 
from subprocess import Popen

skip = int(sys.argv[1])
fin = sys.stdin
fin.read(skip)
cmd = 'wc -c'.split()
Popen(cmd, stdin=fin).wait()

该程序跳过输入的指定字节数，然后wc计算剩余字节数。

现在尝试使用dd生成输入的程序：

# skipping 0, everything works fine:
$ dd if=/dev/zero bs=1 count=100 2>/dev/null | python wc.py 0
100

$ # but skipping more than 0 yields an unexpected result.
$ # this should return 99:
$ dd if=/dev/zero bs=1 count=100 2>/dev/null | python wc.py 1
0

$ # I noticed it skips up to the 4k boundary.
$ # this should return 8191:
$ dd if=/dev/zero bs=1 count=8192 2>/dev/null | python wc.py 1
4096

谁能解释这种意外行为？一个已知问题？应该提交的错误？“你做错了”？

FWIW，我最终通过使用标准输入管道解决了这个问题，然后一次输入一个数据块：

p = Popen(cmd, stdin=PIPE)
chunk = fin.read(CHUNK_SIZE)
while chunk:
    p.stdin.write(chunk)
    chunk = fin.read(CHUNK_SIZE)
p.stdin.close()
p.wait()

score 3 · Accepted Answer

.read()on 函数在Pythonsys.stdin内部进行缓冲。因此，当您读取一个字节时，Python 实际上会读取整个缓冲区，并期望您很快会再次执行相同的操作。但是，读取缓冲区已满（在您的情况下为 4096）意味着操作系统认为输入已被读取并且不会将其传递给wc.

os.read()您可以通过使用跳过必要的输入字节数来避免此问题。这会直接调用操作系统，并且不会在您的进程中缓冲数据：

os.read(fin.fileno(), skip)

python - python的subprocess.Popen跳过输入

1 回答 1

Related

Reference