python - Scrapy - 外壳中的 301 重定向

Question

我找不到以下问题的解决方案。我正在使用 Scrapy（最新版本）并尝试调试蜘蛛。使用scrapy shell https://jigsaw.w3.org/HTTP/300/301.html-> 它不遵循重定向（它使用默认蜘蛛来获取数据）。如果我正在运行我的蜘蛛，它会遵循 301 - 但我无法调试。

怎样才能让 shell 跟随 301 允许调试最后一页呢？

score 10 · Accepted Answer

Scrapy 使用 Redirect Middleware 进行重定向，但是它没有在 shell 中启用。快速解决此问题：

scrapy shell "https://jigsaw.w3.org/HTTP/300/301.html"
fetch(response.headers['Location'])

另外要调试您的蜘蛛，您可能需要检查您的蜘蛛收到的响应：

from scrapy.shell import inspect_response
def parse(self, response)
    inspect_response(response, self)
    # the spider will stop here and open up an interactive shell during the run

python - Scrapy - 外壳中的 301 重定向

1 回答 1

Related

Reference