python - BeatutifulSoup 的 findAll 函数无法获取所有需要的部分

Question

我目前正在使用 BeautifulSoupfindAll函数来提取网页的所需属性。但是，它无法获得所有所需的零件并返回None某些零件。我的python代码是这样的：

from bs4 import BeautifulSoup
import urllib

url = 'http://code.google.com/p/android/issues/detail?id=1060&colspec=ID Type Status Owner Summary Stars Opened Closed Modified Reporter Cc Project Reportedby Priority Version Target Milestone Component MergedInto BlockedOn Blocking Blocked Subcomponent Attachments'
issue_page = urllib.urlopen(url).read()

soup = BeautifulSoup(issue_page)
comment_parts =  soup.findAll(name = 'div',attrs={'class':'cursor_off vt issuecomment'})
for comment_part in comment_parts:
    print str(comment_part)+'\n'

它只获取前 48 个，第 49 个和后续的不返回。我查看了对应的html页面的源码，第49个和第48个和之前的一样。我真的想不通为什么会这样！有没有人可以帮帮我？非常感谢！

score 1 · Accepted Answer

当我执行你的代码时，我得到 58 个结果。

... Your code ...
print len(comment_parts)

... 和，

print comment_parts[-1]

打印页面上的最后一项。你有什么不一样的吗？

python - BeatutifulSoup 的 findAll 函数无法获取所有需要的部分

1 回答 1

Related

Reference