python - 如何让 BeautifulSoup 4 尊重自闭标签？

Question

这个问题是BeautifulSoup4特有的，这使得它与前面的问题不同：

既然BeautifulStoneSoup不见了（以前的 xml 解析器），我怎样才能bs4尊重一个新的自闭标签？例如：

import bs4   
S = '''<foo> <bar a="3"/> </foo>'''
soup = bs4.BeautifulSoup(S, selfClosingTags=['bar'])

print soup.prettify()

不会自动关闭bar标签，但会给出提示。bs4 所指的这个树生成器是什么以及如何自我关闭标签？

/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:112: UserWarning: BS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.
  "BS4 does not respect the selfClosingTags argument to the "
<html>
 <body>
  <foo>
   <bar a="3">
   </bar>
  </foo>
 </body>
</html>

score 15 · Accepted Answer

要解析 XML，您将“xml”作为第二个参数传递给 BeautifulSoup 构造函数。

soup = bs4.BeautifulSoup(S, 'xml')

您需要安装 lxml。

您不再需要通过selfClosingTags：

In [1]: import bs4
In [2]: S = '''<foo> <bar a="3"/> </foo>'''
In [3]: soup = bs4.BeautifulSoup(S, 'xml')
In [4]: print soup.prettify()
<?xml version="1.0" encoding="utf-8"?>
<foo>
 <bar a="3"/>
</foo>

python - 如何让 BeautifulSoup 4 尊重自闭标签？

1 回答 1

Related

Reference