python - urllib2 中的数据与 Safari 的 Web Inspector 中的数据不同

Question

我在这里和这里查看有关我的问题的信息，但没有运气。

我编写了一些旨在获取网页源代码的 python 代码，例如在 Safari 的 Web Inspector 中。但是，我从我的应用程序和 Safari 的 Web Inspector 中得到了不同的代码。到目前为止，这是我的代码：

#!/usr/bin/python

import urllib2

# headers

hdr = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko) Version/6.0.3 Safari/536.28.10',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Cache-Control': 'max-age=0'}

# request data

req = urllib2.Request("https://www.google.com/#q=rainbow&safe=active", headers=hdr)

# try to get data
try:
    page = urllib2.urlopen(req)
    print page.info()
except urllib2.HTTPError, e:
    print e.fp.read()


content = page.read()

#print content

print content

并且标头与 Web Inspector 中的内容相匹配：

网络检查员

但是，对于谷歌搜索“彩虹”，返回的代码是不同的。

我的蟒蛇：

http://paste.ubuntu.com/6270549/

网络检查员：

http://paste.ubuntu.com/6270606/

据我所知，我的代码似乎缺少}catch(e){gbar_._DumpException(e)}Web Inspector 代码中存在的大量普遍存在的行。另外，我的代码只有 78 行，而 Web Inspector 代码有 235 行。这是否意味着我的代码没有获取所有 javascript 或网页的其他部分？如何让我的代码检索与 Web Inspector 相同的数据？

score 1 · Accepted Answer

您使用了错误的链接来使用谷歌搜索进行搜索 - 正确的链接应该是：

https://www.google.com/search?q=rainbow&safe=active

代替：

https://www.google.com/#q=rainbow&safe=active

第二个链接在 python 中使用时会导致重定向到 Google 的主页，因为在 Safari 中不使用时（由于某种原因）它是不正确的。这就是代码不同的原因。

python - urllib2 中的数据与 Safari 的 Web Inspector 中的数据不同

1 回答 1

Related

Reference