python-3.x - Python & Beautiful Soup - 搜索结果字符串

Question

我正在使用 Beautiful Soup 来解析 HTML 表。

Python 3.2 版
靓汤4.1.3版

尝试使用 findAll 方法查找行中的列时遇到问题。我收到一个错误，说列表对象没有属性 findAll。我通过堆栈交换的另一篇文章找到了这种方法，这不是问题。（BeautifulSoup HTML 表格解析）

我意识到 findAll 是 BeautifulSoup 的一种方法，而不是 python 列表。奇怪的是 findAll 方法在我在表列表中找到行时起作用（我只需要页面上的第二个表），但是当我尝试在行列表中查找列时。

这是我的代码：

from urllib.request import URLopener
from bs4 import BeautifulSoup

opener = URLopener() #Open the URL Connection
page = opener.open("http://www.labormarketinfo.edd.ca.gov/majorer/countymajorer.asp?CountyCode=000001") #Open the page
soup = BeautifulSoup(page)

table = soup.findAll('table')[1] #Get the 2nd table (index 1)
rows = table.findAll('tr') #findAll works here
cols = rows.findAll('td') #findAll fails here
print(cols)

score 4 · Accepted Answer

findAll()返回一个结果列表，您需要遍历这些或选择一个findAll()以使用它自己的方法到达另一个包含的元素：

table = soup.findAll('table')[1]
rows = table.findAll('tr')
for row in rows:
    cols = rows.findAll('td')
    print(cols)

或选择一行：

table = soup.findAll('table')[1]
rows = table.findAll('tr')
cols = rows[0].findAll('td')  # columns of the *first* row.
print(cols)

请注意，findAll已弃用，您应该find_all()改用。

python-3.x - Python & Beautiful Soup - 搜索结果字符串

1 回答 1

Related

Reference