python - Scrapy - 从 javascript 脚本响应中检索身份验证令牌

Question

我需要有关此特定情况的帮助。

设想

呼叫站点

http://www.example.com/index.php

我可以从<script>标签中获取此信息

https://www.example.com/anotherpage.php?key=ABCDFG

使用密钥，我必须调用此端点

https://www.example.com/login.php?key=ABCD

用于检索存储在 javascript 响应中的 SessionID

-- omitted

private._sessID='MYSESSIONID';

-- omitted

最后，使用这个 sessionId 并执行正确的 POST 操作，我可以在我需要的所有页面中导航。

我的僵局

我可以使用scrapy shellwith模拟所有步骤regEx（并且一切正常），但在开始数据提取之前，我不知道如何在 scrapy 蜘蛛中管理这些步骤。

有人可以帮我吗？

score 2 · Accepted Answer

您需要http://www.example.com/index.php通过在启动请求方法中调用它来从基本 URL 开始，并编写其回调并从其他端点提取信息并将该结果带到其他回调中，然后您可以开始抓取过程。

您需要通过以下方式实现

class CrawlSpider(scrapy.CrawlSpider):

   def parse_authentication_token(self, response):
      //extract token or whatever require and then call supers parse
      yield from super().parse()

   def start_request(self):
       return Request(url, callback=self.parse_authentication_token)

python - Scrapy - 从 javascript 脚本响应中检索身份验证令牌

1 回答 1

Related

Reference