python - UnicodeDecodeError：“utf8”编解码器无法解码字节

Question

我正在解析一个具有“iso-8859-15”编码的 xml 文件。

像“Zürich”、“Aktienrückk”这样的词被转换为“ä ;” 等等

我尝试了这些建议：

p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
>>> p.text
u'found "\u62c9\u67cf \u591a\u516c \u56ed"'
>>> print p.text

但我收到类似的错误UnicodeDecodeError: 'ascii' codec can't decode byte

即使这也无济于事

content = unicode(mystring.strip(codecs.BOM_UTF8), 'utf-8')

我在 Stack Overflow 上尝试了很多建议，但我无法弄清楚我的方式。

我需要将解析后的内容写回具有相同字符集（如“ü”）的 html 文件

score 1 · Accepted Answer

尝试这个：

from xml.etree import ElementTree
p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
print p.text.encode('utf8')

found "拉柏 多公 园"

对于您的示例：

# -*- coding: utf-8 -*-
from xml.etree import ElementTree
text = 'Aktienrückk'.decode('utf8')
print text.encode('utf8')

 Aktienrückk

不要忘记放在# -*- coding: utf-8 -*-文件的开头。

python - UnicodeDecodeError：“utf8”编解码器无法解码字节

1 回答 1

Related

Reference