python - 编码/解码 unicode 和 utf-8：Python

Question

我有一个 html 文本：If I'm reading lots of articles

我正在尝试将'其他此类特殊字符替换为 unicode '。我做了

rawtxt.encode('utf-8').encode('ascii','ignore')

，但它失败了

错误：UnicodeDecodeError：“ascii”编解码器无法解码字节 0xe2

score 3 · Accepted Answer

您遇到了 HTML 实体问题，而不是 unicode 或 UTF-8。试试这个：

import HTMLParser
h = HTMLParser.HTMLParser()
s = h.unescape('If I&#039;m reading lots of articles')
print s

这打印If I'm reading lots of articles。

1 回答 1