python - 如何从美丽汤中的锚标签中提取href链接

Question

可能重复：
BeautifulSoup 获取 href

我正在使用漂亮的汤，下面是我的代码

import urllib2
data = urllib2.urlopen("some_url")
html_data = data.read()
soup = BeautifulSoup(html_data)
href_tags = soup.findAll('a')

结果：

href_tags = 
[<a href="http://www.exampl.com/score_card" target="_blank" style="font-family:arial;color:#192e94;">Click Here</a>, 
<a href="https://example.icims.com/jobs/search?pr=5">what is this</a>,
<a href="https://example.com/search?pr=6">Cool</a>,
<a href="https://example.com/help/host/search?pr=7">Hello</a>]

但实际上我想要所有锚标签中的href，我怎样才能提取href标签。

提前致谢.........

score 2 · Accepted Answer

尝试循环匹配：

import urllib2
data = urllib2.urlopen("some_url")
html_data = data.read()
soup = BeautifulSoup(html_data)

for a in soup.findAll('a',href=True):
    print a['href']

score 0 · Accepted Answer

从我的头顶上 -href_tags = [tag['href'] for tag in soup.findAll('a', {'href': True})]

确保有一个 href 属性，{'href': True}以便tag.attr['href']不会失败。

python - 如何从美丽汤中的锚标签中提取href链接

2 回答 2

Related

Reference