-2

我有这样的文字:

‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.

我知道 #8216 是一个 ASCII 字符。如何在不使用繁琐的 .replace 的情况下将其转换为普通字符。

4

1 回答 1

3

你在那里有一个 HTML 转义。使用HTMLParser.HTMLParser()该类来取消转义这些:

from HTMLParser import HTMLParser

parser = HTMLParser()
unescaped = parser.unescape(escaped)

演示:

>>> from HTMLParser import HTMLParser
>>> parser = HTMLParser()
>>> escaped = '‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.'
>>> parser.unescape(escaped)
u'\u2018The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,\u2019wroteforumuser Ensorceled.'
>>> print parser.unescape(escaped)
‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.

在 Python 3 中,该HTMLParser模块已重命名为html.parser; 相应地调整导入:

from html.parser import HTMLParser
于 2013-09-28T15:37:56.713 回答