Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
我有一个 html 文本:If I'm reading lots of articles
If I'm reading lots of articles
我正在尝试将'其他此类特殊字符替换为 unicode '。我做了
'
'
rawtxt.encode('utf-8').encode('ascii','ignore')
,但它失败了
错误:UnicodeDecodeError:“ascii”编解码器无法解码字节 0xe2
您遇到了 HTML 实体问题,而不是 unicode 或 UTF-8。试试这个:
import HTMLParser h = HTMLParser.HTMLParser() s = h.unescape('If I'm reading lots of articles') print s
这打印If I'm reading lots of articles。
If I'm reading lots of articles