python - 使用pyquery过滤html

Question

我正在尝试使用 pyquery 解析 html。我面临一个不确定的问题。我的代码如下：

from pyquery import PyQuery as pq
document = pq('<p id="hello">Hello</p><p id="world">World !!</p>')
p = document('p')
print(p.filter("#hello"))

并且打印结果的期望应该如下：

<p id="hello">Hello</p>

但实际响应如下：

<p id="hello">Hello</p><p id="world">World !!</p></div></html>

如果我只想指定部分 html 而不是整个 html 内容的其余部分，我应该如何编写它。

谢谢

score 1 · Accepted Answer

您可以使用内置库ElementTree

import xml.etree.ElementTree as ET

html = '''<html><p id="hello">Hello</p><p id="world">World !!</p></html>'''
root = ET.fromstring(html)
p = root.find('.//p[@id="hello"]')
print(ET.tostring(p))

输出

b'<p id="hello">Hello</p>'

python - 使用pyquery过滤html

1 回答 1

Related

Reference