1

我正试图从 BBC 体育页面中提取主要标题(目前:“温格预测 '活跃' 1 月”)。ID 是“lead-caption”,它位于一个<h2>和一个<a>标签中。我正在使用 Python。

from bs4 import BeautifulSoup
import urllib2
url = urllib2.urlopen("http://www.bbc.co.uk/sport/football/teams/arsenal")
soup=BeautifulSoup(url.read())
#Things I've tried
headline=soup.find('a', attrs={'id': 'lead-caption'})
print headline
#The above prints 'None'
headline1=soup.find('lead-caption').getText()
print headline1
#The above print "'NoneTpye' Object has no attirbute 'getText'
tag = soup.a
tag ['id'] = 'lead-caption'
type(tag)
print tag.string
#Error: NoneType object does not support item assignment

任何帮助将非常感激。谢谢 :)

4

1 回答 1

2

您的代码几乎是正确的,您正在寻找错误的元素,这就是您得到的原因None,它应该是div

headline=soup.find('div', attrs={'id': 'lead-caption'})
headline_text=headline.find('a').getText()
print headline_text

输出:

温格预测“活跃”一月

于 2016-01-12T21:31:06.797 回答