python - Scrapy：在一个请求失败时（例如404,500），如何请求另一个替代请求？

Question

我有scrapy的问题。在一个请求失败时（例如 404,500），如何请求另一个替代请求？比如两个链接可以获取价格信息，一个失败，自动请求另一个。

score 18 · Accepted Answer

在请求中使用“errback”，例如 errback=self.error_handler error_handler 是一个函数（就像回调函数）在这个函数中检查错误代码并发出替代请求。

请参阅 scrapy 文档中的 errback：http: //doc.scrapy.org/en/latest/topics/request-response.html

score 9 · Accepted Answer

只需handle_httpstatus_list = [404, 500]在方法中设置并检查状态码parse。这是一个例子：

from scrapy.http import Request
from scrapy.spider import BaseSpider


class MySpider(BaseSpider):
    handle_httpstatus_list = [404, 500]
    name = "my_crawler"

    start_urls = ["http://github.com/illegal_username"]

    def parse(self, response):
        if response.status in self.handle_httpstatus_list:
            return Request(url="https://github.com/kennethreitz/", callback=self.after_404)

    def after_404(self, response):
        print response.url

        # parse the page and extract items

另见：

希望有帮助。

python - Scrapy：在一个请求失败时（例如404,500），如何请求另一个替代请求？

2 回答 2

Related

Reference