python - 在 beautifulsoup 中使用 findAll 过滤结果

Question

import urllib2
from BeautifulSoup import BeautifulSoup

result = urllib2.urlopen("http://www.bbc.co.uk/news/uk-scotland-south-scotland-12380537")
html=result.read()
soup= BeautifulSoup(html)
print soup.html.head.title

print soup.findAll('div', attrs={ "class" : "story-body"})

问题似乎是我想要的信息在故事正文中，但它位于最底部。所以我最终得到了大量的垃圾信息，直到我到达那里。

print soup.findAll('p', attrs={ 'class' : "introduction"})

只给我第一个<p>在这个例子中还有 8 个要收集

因此，希望从介绍开始到故事主体结束收集...有什么想法吗？

score 1 · Accepted Answer

就 CSS 选择器而言，您希望选择中的所有p元素.story-body：

print soup.select('.story-body p')

http://www.crummy.com/software/BeautifulSoup/bs4/doc/index.html?highlight=select#css-selectors

python - 在 beautifulsoup 中使用 findAll 过滤结果

1 回答 1

Related

Reference