python - 与 InitSpider 的scrapy登录问题

Question

我正在尝试使用InitSpider. 由于某种原因，InitSpider它总是无法登录。我的代码类似于以下帖子中的答案：

我在日志中看到的响应是这样的：

2012-12-20 22:56:53-0500 [linked] DEBUG: Redirecting (302) to <GET https://example.com/> from <POST https://example.com/>

使用上面帖子中的代码，我有相同init_request的login、和check_login_response功能。我可以通过日志语句看到它到达了login函数，但它似乎永远不会到达check_login_response函数。

当我使用重新实现代码BaseSpider并FormRequest在parse函数中执行时，我可以毫无问题地登录。是否有一个原因？还有什么我应该做的吗？为什么我会因为使用登录而获得重定向InitSpider？

[编辑]

class DemoSpider(InitSpider):
    name = 'linked'
    login_page = # Login URL
    start_urls = # All other urls

    def init_request(self):
        #"""This function is called before crawling starts."""
        return Request(url=self.login_page, callback=self.login)

    def login(self, response):
        #"""Generate a login request."""
        return FormRequest.from_response(response, 
            formdata={'username': 'username', 'password': 'password'},
            callback=self.check_login_response)

    def check_login_response(self, response):
        #"""Check the response returned by a login request to see if we are successfully logged in."""
        if "Sign Out" in response.body:
            self.log("\n\n\nSuccessfully logged in. Let's start crawling!\n\n\n")
            # Now the crawling can begin..
            return self.initialized()
        else:
            self.log("\n\n\nFailed, Bad times :(\n\n\n")
            # Something went wrong, we couldn't log in, so nothing happens.

    def parse(self, response):
        self.log('got to the parse function')

以上是我的蜘蛛代码。

score 2 · Accepted Answer

在为此苦苦挣扎后，我想通了，并在我的博客上发布了解决方案：

http://tmblr.co/ZjkSZteCOTyH

基本上我使用BaseSpider并覆盖了start_requests处理登录的方法。

python - 与 InitSpider 的scrapy登录问题

1 回答 1

Related

Reference