python - 读取带有重音元音的文件时出错

Question

以下语句从文件中填充列表：

action = []

with open (os.getcwd() + "/files/" + "actions.txt") as temp:
         action = list (temp)

给我以下错误：

(result, consumed) = self._buffer_decode (data, self.errors, end)
UnicodeDecodeError: 'utf-8' codec can not decode byte 0xf1 in position 67: invalid continuation byte

如果我添加errors = 'ignore'：

action = []

with open (os.getcwd () + "/ files /" + "actions.txt", errors = 'ignore') as temp:
         action = list (temp)

是读取文件，但不是ñ重音的元音á-é-í-ó-ú是 python 3 的作品，据我所知，默认为 'utf-8'

我正在寻找两天或更长时间的解决方案，我越来越困惑。

提前非常感谢您的任何建议。

score 2 · Accepted Answer

您应该使用codecs正确的编码打开文件。

import codecs
with codecs.open(os.getcwd () + "/ files /" + "actions.txt", "r", encoding="utf8") as temp:
    action = list(temp)

请参阅编解码器文档

score 2 · Accepted Answer

正如@Bogdan 指出的那样，您可能不会处理 utf-8 数据。您可以利用chardet之类的模块来尝试确定编码。如果您在 unix-y 环境中，您也可以尝试file在其上运行命令来猜测编码。

使用您的错误消息字符：

>>> import chardet
>>> sample_string = '\xf1'
>>> chardet.detect(sample_string)
{'confidence': 0.5, 'encoding': 'windows-1252'}

python - 读取带有重音元音的文件时出错

2 回答 2

Related

Reference