python - 缩小我使用 python 从网站上抓取的内容

Question

我正在尝试为网站练习我的 python 抓取，但是在 python 无法识别我所要求的内容的情况下，我无法将其缩小到合理的大小。例如，这是我的代码：

import bs4
import requests

url = requests.get('https://ballotpedia.org/Alabama_Supreme_Court')
soup = bs4.BeautifulSoup(url.text, 'html.parser')
y = soup.find('table')
print(y)

我正在尝试获取阿拉巴马州最高法院法官的姓名，但使用此代码，我得到的信息太多了。我尝试过诸如（第6行）之类的东西

y = soup.find('table',{'class':'wikitable sortable'})`

但我收到一条消息说搜索没有找到结果。

这是网页检查的图像。我的目标是让thead在我的代码中工作，但失败了！

如何向 python 指定我只想要评委的姓名？

非常感谢！

score 3 · Accepted Answer

简单地说，我会这样做。

import pandas as pd

df = pd.read_html("https://ballotpedia.org/Alabama_Supreme_Court")[2]["Judge"]

print(df.to_list())

输出：

['Brad Mendheim', 'Kelli Wise', 'Michael Bolin', 'William Sellers', 'Sarah Stewart', 'Greg Shaw', 'Tommy Bryan', 'Jay Mitchell', 'Tom 
Parker']

现在回到原来issue的解决它，因为我个人喜欢解决真正的问题，而不是导航到替代解决方案。

有区别findwhich 将只返回第一个element但find_all将返回 a listof elements。检查文档。

直接导入from bs4 import BeautifulSoup而不是import bs4因为它是 Python的 DRY 原则。

留下bs4来处理内容，因为它是后台的任务之一。所以而不是r.text使用r.content

现在，我们将深入到HTML选择它：

from bs4 import BeautifulSoup
import requests

r = requests.get("https://ballotpedia.org/Alabama_Supreme_Court")
soup = BeautifulSoup(r.content, 'html.parser')


print([item.text for item in soup.select(
    "table.wikitable.sortable.jquery-tablesorter a")])

现在，您必须阅读有关CSS-Selection 的内容

输出：

['Brad Mendheim', 'Kelli Wise', 'Michael Bolin', 'William Sellers', 'Sarah Stewart', 'Greg Shaw', 'Tommy Bryan', 'Jay Mitchell', 'Tom Parker']

python - 缩小我使用 python 从网站上抓取的内容

1 回答 1

Related

Reference