python - 从转义的 html -> 到常规的 html？- Python

Question

我使用 BeautifulSoup 来处理通过 REST API 收集的 XML 文件。

响应包含 HTML 代码，但 BeautifulSoup 转义了所有 HTML 标记，因此可以很好地显示。

不幸的是，我需要 HTML 代码。

我将如何继续将转义的 HTML 转换为正确的标记？

非常感谢您的帮助！

score 15 · Accepted Answer

我认为您需要Python 标准库中的xml.sax.saxutils.unescape。

例如：

>>> from xml.sax import saxutils as su
>>> s = '&lt;foo&gt;bar&lt;/foo&gt;'
>>> su.unescape(s)
'<foo>bar</foo>'

score 2 · Accepted Answer

你可以试试urllib模块吗？

它有一种unquote()可能适合您需求的方法。

编辑：再三考虑，（以及更多阅读您的问题）您可能只想使用string.replace()

像这样：

string.replace('&lt;','<')
string.replace('&gt;','>')

2 回答 2