python - Python: urlopen not downloading the entire site

Question

Greetings,

I have done:

import urllib

site = urllib.urlopen('http://www.weather.com/weather/today/Temple+TX+76504')
site_data = site.read()
site.close()

but it doesn't compare to viewing the source when loaded in firefox.

I suspected the user agent and did this:

class AppURLopener(urllib.FancyURLopener):
    version = "Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.2.8) Gecko/20100722 Ubuntu/10.04 (lucid) Firefox/3.6.8"

urllib._urlopener = AppURLopener()

and downloaded it, but it still doesn't download the whole website.

Can someone please help me do user agent switching, if that is the likely culprit?

Thanks, Narnie

score 3 · Accepted Answer

更有可能是iframe代码中有一个，或者 javascript 正在修改 DOM。如果有 iframe，您必须解析页面以获取 iframe 的 url，或者如果它是一次性的，则只需手动执行。如果是 javascript，我听说 selenium-rc 很好，但没有第一手经验。

score 2 · Accepted Answer

本地显示的下载页面可能看起来不同，有几个原因，比如有相对链接（可以固定添加<base href="http://www.weather.com/today/">到页面头元素中），或非功能性 ajax 请求（请参阅绕过同源策略的方法）。

python - Python: urlopen not downloading the entire site

2 回答 2

Related