python - PyQuery 不会返回页面上的元素

Question

我已经设置了一个 Python 脚本来打开这个网页PyQuery。

import requests
from pyquery import PyQuery

url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)
pqPage = PyQuery(page.content)

但pqPage("li")只返回一个空白列表，[]. 同时，pqPage.text()显示页面的 HTML 文本，其中包含li元素。

为什么代码不会返回li元素列表？我如何让它做到这一点？

score 1 · Accepted Answer

在这个页面上似乎PyQuery有问题 - 可能是因为它是xhtml页面。或者可能是因为它使用命名空间xmlns="http://www.w3.org/1999/xhtml"

当我使用

pqPage.css('li')

然后我得到

[<{http://www.w3.org/1999/xhtml}html#sfFrontendHtml>]

{http://www.w3.org/1999/xhtml}在元素中显示- 它是namespace. 某些模块在HTML使用名称空间方面存在问题。

我使用它没有问题Beautifulsoup

import requests
from bs4 import BeautifulSoup as BS

url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)

soup = BS(page.text, 'html.parser')
for item in soup.find_all('li'):
    print(item.text)

编辑：在谷歌挖掘后，我发现使用parser="html"inPyQuery()我可以得到li.

import requests
from pyquery import PyQuery

url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)

pqPage = PyQuery(page.text, parser="html")
for item in pqPage('li p'):
    print(item.text)

python - PyQuery 不会返回页面上的元素

1 回答 1

Related

Reference