python-2.7 - 运行 make 命令编译 C 库时，Pexpect 抛出 unicode 解码错误

Question

我正在运行 make 在 python 项目中编译 C 库，并使用 python(python 3.3) pexpect 作为自动化部分。因此，make 命令的输出由 pexpect 以块的形式读取，并且在一个这样的块中，当 pexpect 尝试将 (python 3 bytes) 转换为 (python3's str) type 时，它会引发以下错误。主要问题是这个问题是间歇性的，不经常发生。

UnicodeDecodeError：“utf-8”编解码器无法解码位置 1998-1999 中的字节：数据意外结束

--> 下面的示例代码显示当数据包含多字节字符（即特殊字符或任何 unicode 数据）时。Pexpect 处理多字节字符的部分数据时解码失败。

#!/usr/bin/python 
# -*- coding: utf-8 -*-
from base import pexpect

MAX_READ_CHUNK = 8

def run(cmd):
    child = pexpect.spawn(cmd, maxread=MAX_READ_CHUNK)
    while True:
       i = child.expect([pexpect.EOF,pexpect.TIMEOUT])

       if child.before:
          print(child.before)

       if i == 0: # EOF
           break
       elif i == 1: # TIMEOUT
           continue

    child.close()
    return child.exitstatus

############## Main ################
data='“HELLO WORLD”' 
#i.e. data = b'\xe2\x80\x9cabcd\xe2\x80\x9d'
print("Data in readable form = %s "%data)
print("Data in bytes         = %s \n\n"%data.encode('utf-8'))

run("echo %s"%data)

以下 Traceback 错误即将到来：

Data in readable form = “HELLO WORLD” 
Data in bytes         = b'\xe2\x80\x9cHELLO WORLD\xe2\x80\x9d' 


_cast_unicode() enc=[utf-8] s=[b'\xe2\x80\x9cHELLO'] 
_cast_unicode() enc=[utf-8] s=[b' WORLD\xe2\x80'] 
Traceback (most recent call last):
  File "test.py", line 33, in <module>
    run("echo %s"%data)
  File "test.py", line 11, in run
    i = child.expect([pexpect.EOF,pexpect.TIMEOUT])
  File "/home/test/Downloads/base/pexpect.py", line 1358, in expect
    return self.expect_list(compiled_pattern_list, timeout, searchwindowsize)
  File "/home/test/Downloads/base/pexpect.py", line 1372, in expect_list
    return self.expect_loop(searcher_re(pattern_list), timeout, searchwindowsize)
  File "/home/test/Downloads/base/pexpect.py", line 1425, in expect_loop
    c = self.read_nonblocking (self.maxread, timeout)
  File "/home/test/Downloads/base/pexpect.py", line 1631, in read_nonblocking
    return super(spawn, self).read_nonblocking(size=size, timeout=timeout)\
  File "/home/test/Downloads/base/pexpect.py", line 868, in read_nonblocking
    s2 = self._cast_buffer_type(s)
  File "/home/test/Downloads/base/pexpect.py", line 1614, in _cast_buffer_type
    return _cast_unicode(s, self.encoding)
  File "/home/test/Downloads/base/pexpect.py", line 156, in _cast_unicode
    return s.decode(enc)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 6-7:
 unexpected end of data

当上面代码中的 MAX_READ_CHUNK 值更改为 9 时，它工作正常。

# Output When "MAX_READ_CHUNK = 9"
Data in readable form = “HELLO WORLD” 
Data in bytes         = b'\xe2\x80\x9cHELLO WORLD\xe2\x80\x9d' 


_cast_unicode() enc=[utf-8] s=[b'\xe2\x80\x9cHELLO '] 
_cast_unicode() enc=[utf-8] s=[b'WORLD\xe2\x80\x9d\r'] 
_cast_unicode() enc=[utf-8] s=[b'\n'] 
“HELLO WORLD”

如何在制作期间处理此“UnicodeDecodeError：'utf-8'编解码器无法解码位置字节：数据意外结束”。

score 0 · Accepted Answer

发生的事情是 pexpect 无法处理跨越不同缓冲区的 Unicode 代码点的字节；在您的示例中，\xe2\x80\x9d无法解码，因为\x9d当块大小为 8 的倍数时字节丢失。

不幸的是，我对如何解决这个问题并不熟悉，但我可以想象两种方法：

尝试设置maxread为 1（无缓冲），或
（这是脏的）捕获异常，缓冲输出，并与下一个输出窗口一起处理。
如果您正在处理已知大小的缓冲区，请设置maxread为缓冲区大小。

python-2.7 - 运行 make 命令编译 C 库时，Pexpect 抛出 unicode 解码错误

1 回答 1

Related

Reference