python - 一一读取子进程输出多字节字符

Question

我正在使用子进程运行一个进程：

    p = subprocess.Popen(cmd, stdout=subprocess.PIPE)

我想要做的是在循环中一个一个地读取输出字符：

while something:
    char = p.stdout.read(1)

在 python3中subprocess.Popen().stdout.read()返回bytes()not str()。我想将它用作 str 所以我必须这样做：

    char = char.decode("utf-8")

它适用于 ascii 字符。

但是对于非 ascii 字符（例如希腊字母），我得到一个 UnicodeDecodeError。这就是为什么希腊字符由多个字节组成。这是问题所在：

>>> b'\xce\xb5'.decode('utf-8')
'ε'
>>> b'\xce'.decode('utf-8') # b'\xce' is what subprocess...read(1) returns - one byte
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: unexpected end of data
>>>

我该如何处理？（作为字符串）的输出subprocess.Popen().stdout.read()可以是“lorem ipsum εφδφδσloremipsum”。

我想一次读取一个字符，但这个字符可以由多个字节组成。

score 4 · Accepted Answer

包装文件对象io.TextIOWrapper()以动态解码管道：

import io

reader = io.TextIOWrapper(p.stdout, encoding='utf8')
while something:
    char = reader.read(1)

python - 一一读取子进程输出多字节字符

1 回答 1

Related

Reference