python - 如何使用 beautifulSoup 访问跨度？

Question

我想获取嵌套标签中的数字。我该怎么做？

我的代码输出了这个，但我想得到#40，而不是整两行：

<span class="rankings-score">
<span>#40</span>

这是我的代码：

from bs4 import BeautifulSoup
import requests
import csv

site =  "http://www.usnews.com/education/best-high-schools/national-rankings/page+2"

fields = ['national_rank','school','address','school_page','medal','ratio','size_desc','students','teachers'] 

r = requests.get(site)
html_source = r.text
soup = BeautifulSoup(html_source)

table = soup.find('table')    
rows_list = []      

for row in table.find_all('tr'):                                                                                                                                                                                                                                               

    d = dict()

    d['national_rank'] = row.find("span", 'rankings-score')
    print d['national_rank']

我收到此错误：

AttributeError: 'NoneType' object has no attribute 'span'

当我尝试这个时：

d['national_rank'] = row.find("span", 'rankings-score').span.text

score 6 · Accepted Answer

访问嵌套跨度的文本：

score_span = row.find("span", 'rankings-score')
if score_span is not None:
    print score_span.span.text

您需要确保row.find("span", 'rankings-score')确实找到了一些东西；上面我测试确实有一个<span>发现。

如果没有找到匹配的对象，则该.find()方法返回，因此通常，每当您遇到异常时，涉及您尝试加载的对象，那么您需要在尝试进一步访问信息之前进行测试。NoneAttributeError: 'NoneType' object has no attribute ...Element.find()None

这适用于object.find, object.find_all,object[...]标签属性访问, object.<tagname>, object.select, 等等等等。

python - 如何使用 beautifulSoup 访问跨度？

1 回答 1

Related

Reference