我正在使用 Scrapy 中的 FormRequest 类登录网站并抓取其数据。但是,我遇到的问题是获取生成的登录页面。提交表单请求后,我被重定向到重定向页面,但我想要的页面是重定向页面之后的页面。在普通浏览器中,重定向页面会自动重定向到主页,但在 Scrapy 蜘蛛中,重定向页面是响应并被困在那里。
如何通过此重定向页面以获取主页?
下面的代码:
from scrapy.http import FormRequest
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.http import Request
class DmozSpider(BaseSpider):
name = "dmoz"
start_urls = [
"https://blahblahblahsecurepage"
]
def parse_evalPage(self, response):
filename = response.url.split("?")[-1].split("&")[0]
open(filename, "wb").write(response.body)
def parse(self, response):
hxs = HtmlXPathSelector(response)
if hxs.select("//form[@id='loginform']"):
return self.login(response)
def after_login(self, response):
# check login succeed before going on
if "Please enter a valid username" in response.body:
self.log("Login failed", level=log.ERROR)
return
# We've successfully authenticated, let's have some fun!
else:
return Request(response, callback=self.parse_evalPage)
def login(self, response):
return [FormRequest.from_response(response,
formdata={'j_username': 'XXXXXXXXXXXX', 'j_password': 'XXXXXXXXXXXXXX'},
formnumber = 1, callback=self.parse_evalPage)]