47

我目前正在尝试使用 Python 登录一个站点,但是该站点似乎在同一页面上发送了一个 cookie 和一个重定向语句。Python 似乎遵循该重定向,因此阻止我读取登录页面发送的 cookie。如何防止 Python 的 urllib(或 urllib2)urlopen 跟随重定向?

4

4 回答 4

33

你可以做几件事:

  1. 构建自己的 HTTPRedirectHandler 来拦截每个重定向
  2. 创建 HTTPCookieProcessor 的实例并安装该开启程序,以便您可以访问 cookiejar。

This is a quick little thing that shows both

import urllib2

#redirect_handler = urllib2.HTTPRedirectHandler()

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Cookie Manip Right Here"
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

    http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)

response =urllib2.urlopen("WHEREEVER")
print response.read()

print cookieprocessor.cookiejar
于 2009-02-16T21:13:43.600 回答
29

If all you need is stopping redirection, then there is a simple way to do it. For example I only want to get cookies and for a better performance I don't want to be redirected to any other page. Also I hope the code is kept as 3xx. let's use 302 for instance.

class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        code, msg, hdrs = response.code, response.msg, response.info()

        # only add this line to stop 302 redirection.
        if code == 302: return response

        if not (200 <= code < 300):
            response = self.parent.error(
                'http', request, response, code, msg, hdrs)
        return response

    https_response = http_response

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor)

In this way, you don't even need to go into urllib2.HTTPRedirectHandler.http_error_302()

Yet more common case is that we simply want to stop redirection (as required):

class NoRedirection(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        return response

    https_response = http_response

And normally use it this way:

cj = cookielib.CookieJar()
opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))
data = {}
response = opener.open('http://www.example.com', urllib.urlencode(data))
if response.code == 302:
    redirection_target = response.headers['Location']
于 2012-07-31T16:33:51.010 回答
12

urllib2.urlopenbuild_opener()使用此处理程序类列表的调用:

handlers = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]

您可以尝试urllib2.build_opener(handlers)使用省略的列表调用自己HTTPRedirectHandler,然后调用open()结果上的方法以打开您的 URL。如果你真的不喜欢重定向,你甚至可以调用urllib2.install_opener(opener)你自己的非重定向开启器。

听起来你真正的问题是urllib2没有按照你想要的方式做 cookie。另请参阅如何使用 Python 登录网页并检索 cookie 以供以后使用?

于 2009-02-16T20:38:43.980 回答
3

这个问题之前在这里被问过。

编辑:如果您必须处理古怪的 Web 应用程序,您可能应该尝试mechanize。这是一个很棒的库,可以模拟 Web 浏览器。您可以控制重定向、cookie、页面刷新……如果网站不[严重] 依赖 JavaScript,您将与 mechanize 相处得很好。

于 2009-02-16T20:46:59.640 回答