python - UnicodeDecodeError：“ascii”编解码器无法解码

Question

我正在使用 file.readline() 读取包含 Python 中罗马尼亚语单词的文件。由于编码，我遇到了许多字符的问题。

例子：

>>> a = "aberație"  #type 'str'
>>> a -> 'abera\xc8\x9bie'
>>> print sys.stdin.encoding
UTF-8

我已经尝试使用 utf-8、cp500 等进行 encode()，但它不起作用。

我找不到我必须使用的正确字符编码？

提前致谢。

编辑：目的是将文件中的单词存储在字典中，并在打印时获取 aberație 而不是 'abera\xc8\x9bie'

score 15 · Accepted Answer

你想做什么？

这是一组字节：

BYTES = 'abera\xc8\x9bie'

它是一组字节，表示utf-8字符串“aberaşie”的编码。您解码字节以获取您的 unicode 字符串：

>>> BYTES 
'abera\xc8\x9bie'
>>> print BYTES 
aberaÈ›ie
>>> abberation = BYTES.decode('utf-8')
>>> abberation 
u'abera\u021bie'
>>> print abberation 
aberație

如果要将 unicode 字符串存储到文件中，则必须将其编码为您选择的特定字节格式：

>>> abberation.encode('utf-8')
'abera\xc8\x9bie'
>>> abberation.encode('utf-16')
'\xff\xfea\x00b\x00e\x00r\x00a\x00\x1b\x02i\x00e\x00'

python - UnicodeDecodeError：“ascii”编解码器无法解码

1 回答 1

Related

Reference