0
import requests
r=requests.get('https://www.crummy.com/software/BeautifulSoup/')
from bs4 import BeautifulSoup as bs
soup=bs(r.text,'html.parser')
links=[x['href'] for x in soup.find_all('a')]
links 

错误是:

KeyError                                  
Traceback (most recent call last)
<ipython-input-137-97ef77b6e69a> in <module>
----> 1 links=[x['href'] for x in soup.find_all('a')]
      2 links

<ipython-input-137-97ef77b6e69a> in <listcomp>(.0)
----> 1 links=[x['href'] for x in soup.find_all('a')]
      2 links

~\anaconda3\lib\site-packages\bs4\element.py in __getitem__(self, key)
   1319         """tag[key] returns the value of the 'key' attribute for the Tag,
   1320         and throws an exception if it's not there."""
-> 1321         return self.attrs[key]
   1322 
   1323     def __iter__(self):

KeyError: 'href'

但是,以下代码可以正常工作:

import requests
r=requests.get('https://en.wikipedia.org/wiki/Harvard_University')
from bs4 import BeautifulSoup as bs
soup=bs(r.text,'html.parser')
classes=[table['class'] for table in soup.find_all('table')]
classes 
4

1 回答 1

0

第一个网站包含以下元素:

<a name="Download">

这个锚点没有href属性(它不是链接,它被用作#Download片段的目标),所以你得到一个错误。

您可以使用选择器将标签过滤为仅具有该href属性的标签。

links=[x['href'] for x in soup.select('a[href]')]
于 2020-04-28T07:02:31.933 回答