我有这样的文字:
‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.
我知道 #8216 是一个 ASCII 字符。如何在不使用繁琐的 .replace 的情况下将其转换为普通字符。
你在那里有一个 HTML 转义。使用HTMLParser.HTMLParser()
该类来取消转义这些:
from HTMLParser import HTMLParser
parser = HTMLParser()
unescaped = parser.unescape(escaped)
演示:
>>> from HTMLParser import HTMLParser
>>> parser = HTMLParser()
>>> escaped = '‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.'
>>> parser.unescape(escaped)
u'\u2018The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,\u2019wroteforumuser Ensorceled.'
>>> print parser.unescape(escaped)
‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.
在 Python 3 中,该HTMLParser
模块已重命名为html.parser
; 相应地调整导入:
from html.parser import HTMLParser