python - 为什么我的 python 脚本有两种类型的结果？

Question

我有这样的脚本

 import mechanize
 url = "http://www.globalhide.com/browse.php?u=u=http://www.whoisxmlapi.com/whoisserver/WhoisService?domainName=google.com"
 br = mechanize.Browser()
 br.set_handle_robots(False)
 br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
 response = br.open(url)
 content = response.read()
 f = open('q.html', 'w')
 f.write(content)
 f.close()

当我从 python shell 运行它时，我需要这样的结果，结果是正确的。但是当我将它保存在 something.py 文件中并像这样运行python something.pyq.html 的内容时在此处输入图像描述，我的代码有什么问题？

score 2 · Accepted Answer

我认为您的代码没有任何问题。更改请求的 url 会返回良好的数据。

该块由 globalhide.com 自己实施。您在问题中添加的链接与您附加的页面相同（或多或少）。我不能确切地告诉你这个热链接阻止是如何实现的，但它可以通过引用标题。调查推荐人欺骗可能会对您有所帮助。

编辑

在推荐人欺骗方面有点过火。我会接受 Aaron 的 cookie 建议。

score 2 · Accepted Answer

对于那个 URL，我有时会得到 XML，有时会得到 Chrome Linux 中的“无热链接”页面。第一次点击返回相同 url 的无热链接页面。如果我清除我的 cookie 并再次访问该页面，我会得到无盗链图像。

此站点似乎需要 Cookie。以下内容应适用于您的代码。

policy = mechanize.DefaultCookiePolicy(rfc2965=True) 
cj = mechanize.LWPCookieJar(policy=policy)
br.set_cookiejar(cj)

有关实现 cookie 的不同方法的更多信息，请查看Mechanize Docs - Cookies。

编辑 1您应该保存 cookie 罐，请参阅Cookielib - Save。

编辑 2这是网站为我设置的 cookie 信息：

Name:   __utma
Content:    53296278.1653562620.1363413018.1311413018.1337443014.1
Domain: .globalhide.com
Path:   /
Send for:   Any kind of connection
Accessible to script:   Yes
Created:    Wednesday, May 1, 2013 6:56:58 AM
Expires:    Friday, May 1, 2015 6:56:58 AM
Name:   s
Content:    x2tjlhb1qfidn5t1ds8kvd24p5
Domain: www.globalhide.com
Path:   /
Send for:   Any kind of connection
Accessible to script:   Yes
Created:    Wednesday, May 1, 2013 6:56:57 AM
Expires:    When the browsing session ends

python - 为什么我的 python 脚本有两种类型的结果？

2 回答 2

Related

Reference