我在测试这个 BeautifulSoup 爬虫时遇到了问题。如果有明显的错误,请原谅我,因为这是我进入 Python 的第三个小时。我在下面有这个代码......
def huffpost_crawl():
article_list = []
DOMAIN = 'huffingtonpost.com'
huff_soup = BeautifulSoup(urllib2.urlopen("http://www.huffingtonpost.com").read())
news_list = huff_soup.find_all("div", {"class", "snp_most_popular_entry"})[0]
for news in news_list[0]:
title = news('div', {'class', 'snp_most_popular_entry_desc'})[0].a.get_text()
full_url = news('div', {'class', 'snp_most_popular_entry_image'}).a["href"]
blurb = ""
thumb_url = news('div', {'class',
'snp_most_popular_entry_image'}).a.img["longdesc"]
print title
huffpost_crawl()
当我pythong test.py
在终端中运行时,我返回...
Traceback (most recent call last):
File "test.py", line 21, in <module>
huffpost_crawl()
File "test.py", line 11, in huffpost_crawl
for news in news_list[0]:
File "/usr/local/lib/python2.7/site-packages/bs4/element.py", line 879, in __getitem__
return self.attrs[key]
KeyError: 0