python - beautifulsoup - 在 div 中提取链接

Question

我有一个汤，内容如下

许多div，我感兴趣的是那些有“foo”类的

在每个 div 中，有很多链接和其他内容，我对第二个链接（第二个<a> </a>）感兴趣 => 它总是第二个我想抓取链接（在 href 属性中）和第二个链接标签之间的文本<a> </a>

例如：

<div class ="foo">
     <a href ="http://example.com"> </a>
     <a href ="http://example2.com"> Title here </a>
</div>

<div class ="foo">
     <a href ="http://example3.com"> </a>
     <a href ="http://example4.com"> Title 2 here </a>
</div>

我想在这里得到：

此处的标题 => http://example2.com

此处的标题 2 => http://example4.com

我试过写一些代码：

soup.findAll("div", { "class" : "foo" })

但这会返回一个包含所有 div 及其内容的列表，我不知道如何更进一步

谢谢：）

score 10 · Accepted Answer

迭代divs 并在a那里找到。

from bs4 import BeautifulSoup

example = '''
<div class ="foo">
     <a href ="http://example.com"> </a>
     <a href ="http://example2.com"> Title here </a>
</div>

<div class ="foo">
     <a href ="http://example3.com"> </a>
     <a href ="http://example4.com"> Title 2 here </a>
'''

soup = BeautifulSoup(example)
for div in soup.findAll('div', {'class': 'foo'}):
    a = div.findAll('a')[1]
    print a.text.strip(), '=>', a.attrs['href']

python - beautifulsoup - 在 div 中提取链接

1 回答 1

Related

Reference