python - 在 Google AppEngine 上使用 urllib2 时出现服务器错误

Question

我不确定为什么在向表单提交任何查询时，在 Google AppEngine 上托管这个简单的代码会返回服务器错误。问题似乎与 html = urllib2.urlopen(" http://google.com/search?q= " + q).read() 行有关，因为没有它，代码也可以正常工作。

import webapp2
import urllib2


form="""
<form action="/process">
    <input name="q">
    <input type="submit">
</form>
"""


class MainHandler(webapp2.RequestHandler):
    def get(self):
        self.response.out.write(form)


class ProcessHandler(webapp2.RequestHandler):
    def get(self):
        q = self.request.get("q")
        html = urllib2.urlopen("http://google.com/search?q=" + q).read()
        self.response.out.write(html)


app = webapp2.WSGIApplication([('/', MainHandler),
                               ('/process', ProcessHandler)],
                               debug=True)

这是返回的错误：

Error: Server Error
The server encountered an error and could not complete your request.

If the problem persists, please report your problem and mention this error message and the query that caused it.

score 1 · Accepted Answer

可能 www.google.com 不接受这种直接连接，取消来自特定用户代理的连接。在简单的 python 环境中，您可以更改用户代理字符串，但我认为通过 google app 引擎无法做到这一点。

score 0 · Accepted Answer

Google 正在向您的搜索字符串返回 403

>>> import urllib2
>>> html = urllib2.urlopen("http://google.com/search?q=Test").read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 442, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 629, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

但是，这有效：

html = urllib2.urlopen(" http://google.com ").read()

所以看起来谷歌正试图阻止这种搜索。正如另一张海报所建议的那样，更改用户代理字符串可能会停止 403。选择一些共同点！

我刚刚使用 Mozilla 用户代理集进行了测试，我可以得到我认为您正在寻找的结果

import urllib2
headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('http://google.com/search?q=Test', None, headers)
html = urllib2.urlopen(req).read()
print html

python - 在 Google AppEngine 上使用 urllib2 时出现服务器错误

2 回答 2

Related

Reference