python - 如何从<a>内部提取链接<h2 class="section-heading">:美汤</h2> <div id="body"><p>我正在尝试提取一个这样写的链接：</p> <pre><code><h2 class="section-heading"> <a href="http://www.nytimes.com/pages/arts/index.html">Arts »</a> </h2> </code></pre> <p>我的代码是：</

:美汤

我正在尝试提取一个这样写的链接：

<h2 class="section-heading">
    <a href="http://www.nytimes.com/pages/arts/index.html">Arts »</a>
</h2>

我的代码是：

问问题 2016-02-12T06:13:16.503

3878 次

首先，您要获取元素href的a，因此您应该a不在n该行上访问。其次，它应该是

a.get('href')

或者

a['href']

如果没有找到这样的属性，则后一种形式会抛出，而前者会返回None，就像通常的字典/映射接口一样。作为.get一个方法，它应该被称为（.get(...)）；索引/元素访问对它不起作用（.get[...]），这就是这个问题的意义所在。

注意，这find可能会失败，返回None，也许你想迭代n.find_all('a', href=True)：

for n in head_links:
   for a in n.find_all('a', href=True):
       print(a['href'])

比使用更简单的find_all是使用select采用 CSS 选择器的方法。在这里，我们只需一次操作，就可以像使用 JQuery 一样轻松地获取具有内部属性的那些<a>元素。href<h2 class="section-heading">

soup = BeautifulSoup(plain_text)
for a in soup.select('h2.section-heading a[href]'):
    print(a['href'])

于 2016-02-12T06:15:48.480 回答