python - Beautifulsoup soup 4 无法读取所有的 html

翻译自：https://stackoverflow.com/questions/18608990 2013-09-04T08:32:13.147

45 次

我有一个网页要获取，当我用 urllib 获取它并打印内容时，我看到了真实的内容长度，但是在我用 bs4 解析 html 后，我看到至少 5 个 div 块不包含在 bs4 解析的 html 中，当我用beautifulsoup解析html时，看到了真实的内容，并且包含了div，我不知道哪里错了，但是我看到的是，bs4删除了一些自己需要的div，我该怎么做解决这个问题？，这是我的示例，

#This one does not remove some neccessary parts, This is okay

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib.urlopen("http://example").read())


#But this one removes some neccessary parts, This is not okay

from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib.urlopen("http://example").read())

谢谢你

python - Beautifulsoup soup 4 无法读取所有的 html

0 回答 0

Related

Reference