0

我有一个如下字符串:

THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you've read the first two bestselling collections of SAGA , you're all caught up and ready to jump on the ongoing train with Chapter Thirteen, beginning an all-new monthly sci-fi/fantasy adventure, as Hazel and her parents head to the planet Quietus in search of cult romance novelist D. Oswald Heist.

可以看出,撇号 ( ' ) 被表示为 ASCII 码:

&#39

你会建议我如何编码这个字符串?

其他ASCII码也出现了:

"
&
4

1 回答 1

0

这些被称为HTML 实体。最简单的方法是取消它们是使用标准库中的HtmlParser

>>> s = "THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you've read the first two bestselling collections of SAGA , you're all caught up and ready to jump on the ongoing train with Chapter Thirteen, beginning an all-new monthly sci-fi/fantasy adventure, as Hazel and her parents head to the planet Quietus in search of cult romance novelist D. Oswald Heist."
>>> import HTMLParser
>>> HTMLParser.HTMLParser().unescape(s)
u"THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you've read the first two bestselling collections of SAGA , you're all caught up and ready to jump on the ongoing train with Chapter Thirteen, beginning an all-new monthly sci-fi/fantasy adventure, as Hazel and her parents head to the planet Quietus in search of cult romance novelist D. Oswald Heist."

另见:

于 2013-08-13T23:16:11.640 回答