python - 使用 sqlite 的子进程、编码和日志记录问题

Question

我已经搜索了很长一段时间来寻找这个问题的答案，我认为这在很大程度上与我不熟悉 subprocess 模块的工作原理有关。如果有人感兴趣，这是一个模糊测试程序。另外，我应该提到这一切都是在 Linux 中完成的（我认为这是相关的）我有一些这样的代码：

# open and run a process and log get return code and stderr information
process = subprocess.Popen([app, file_name], stdout=subprocess.PIPE,
                                             stderr=subprocess.PIPE)
return_code = process.wait()
err_msg = process.communicate()[1]

# insert results into an sqlite database log
log_cur.execute('''INSERT INTO log (return_code, error_msg) 
                   VALUES (?,?)''', [unicode(return_code), unicode(error_msg)])
log_db.commit()

100 次中有 99 次都可以正常工作，但有时我会收到类似于以下内容的错误：

UnicodeDecodeError：“utf8”编解码器无法解码位置 43 中的字节 0xce：无效的继续字节

编辑：全跟踪

Traceback (most recent call last):
  File "openscadfuzzer.py", line 72, in <module>
    VALUES (?,?)''', [crashed, err_msg.decode('utf-8')])
  File "/home/username/workspace/GeneralPythonEnv/openscadfuzzer/lib/python2.7/encodings/utf_8.py",    line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xdb in position 881: invalid continuation byte

这是子进程、我使用它运行的应用程序还是我的代码的问题？任何指针都会受到赞赏（尤其是当它与子进程 stdout 和 stderr 的正确使用有关时）。

score 2 · Accepted Answer

我的猜测是问题出在这个电话上：

unicode(error_msg)

error_msg 的类型是什么？我相当确定默认情况下，子进程 API 将返回子程序输出的原始字节，调用unicode尝试将字节转换为字符（代码点），假设某种编码（在本例中为 utf8）。

我的猜测是这些字节不是有效的 utf8，而是有效的 latin1。您可以指定要在字节和字符之间转换的编解码器：

error_msg.decode('latin1')

这是一个示例，希望能证明问题和解决方法：

>>> b'h\xcello'.decode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 1: invalid continuation byte

>>> b'h\xcello'.decode('latin1')
'hÎllo'

更好的解决方案可能是让您的子进程输出 utf8，但这取决于您的数据库也能够存储哪些数据。

score 1 · Accepted Answer

您可以在这里找到非常好的 Subprocess 教程http://pymotw.com/2/subprocess/及其官方文档：http: //docs.python.org/2/library/subprocess.html，但是从您的错误're getting 已格式化，似乎不是您的代码，而是您的应用程序收到错误，而您只是看到它，因为您正在收集输出。为了确认这一点，您可以使用简单的循环在代码之外运行您的应用程序bash，以查看是否可以再次捕获错误，并在代码中检查应用程序的退出代码 - 当您看到错误时，它应该不同于0，如果应用程序正确提供了退出代码。

python - 使用 sqlite 的子进程、编码和日志记录问题

2 回答 2

Related

Reference