python - 在调用函数时返回 keyerror

Question

我在测试这个 BeautifulSoup 爬虫时遇到了问题。如果有明显的错误，请原谅我，因为这是我进入 Python 的第三个小时。我在下面有这个代码......

def huffpost_crawl():
 article_list = []
 DOMAIN = 'huffingtonpost.com'
 huff_soup = BeautifulSoup(urllib2.urlopen("http://www.huffingtonpost.com").read())
 news_list = huff_soup.find_all("div", {"class", "snp_most_popular_entry"})[0]
 for news in news_list[0]:
    title = news('div', {'class', 'snp_most_popular_entry_desc'})[0].a.get_text()
    full_url = news('div', {'class', 'snp_most_popular_entry_image'}).a["href"]
    blurb = ""
    thumb_url = news('div', {'class', 
   'snp_most_popular_entry_image'}).a.img["longdesc"]


 print title

huffpost_crawl()

当我pythong test.py在终端中运行时，我返回...

Traceback (most recent call last):
  File "test.py", line 21, in <module>
  huffpost_crawl()
File "test.py", line 11, in huffpost_crawl
  for news in news_list[0]:
File "/usr/local/lib/python2.7/site-packages/bs4/element.py", line 879, in __getitem__
  return self.attrs[key]
KeyError: 0

score 1 · Accepted Answer

它看起来像news_list一个字典（键值对）并且没有0. 如果它是您尝试索引的列表，那将有效。因此，而不是你的

for news in news_list[0]:

线，试试

for key, news in news_list.iteritems():

这将遍历字典中的每个项目。如果您只想要第一个结果，我不确定您将如何确定。尝试打印出项目以确定返回的内容。

score 1 · Accepted Answer

这是问题所在：

news_list = huff_soup.find_all("div", {"class", "snp_most_popular_entry"})[0]
for news in news_list[0]:

只需删除这两个[0]位中的一个，问题（或者至少是这个问题——我不能保证您的其余代码执行您想要的）就会消失。

我不会解释代码错误的原因，因为您确实需要学习调试代码并自己解决这些问题。

首先在交互式解释器中执行此操作：

>>> huff_soup = BeautifulSoup(urllib2.urlopen("http://www.huffingtonpost.com").read())
>>> news_list = huff_soup.find_all("div", {"class", "snp_most_popular_entry"})

看看它返回了什么——它是什么形状，你如何以交互方式到达你想要的部分？一旦你知道了这一点，在你的脚本中如何做就应该很明显了。

即使在事情太复杂而无法交互使用的情况下，您也可以使用打印语句记录内容，在调试器中运行等。不要只是盲目地盯着不起作用的代码说“为什么它不起作用?”，或者将代码发布到某个地方并询问其他人为什么它不起作用，否则您将永远学不到任何东西。

python - 在调用函数时返回 keyerror

2 回答 2

Related

Reference