python - Scrapy 解析 JSON 输出

Question

我正在使用 Scrapy 抓取网站。有些页面使用 AJAX，所以我收到了 AJAX 请求来获取实际数据。到目前为止，一切都很好。这些 AJAX 请求的输出是 JSON 输出。现在我想解析 JSON 但 Scrapy 只提供 HtmlXPathSelector。是否有人成功地将 JSON 输出转换为 HTML 并能够使用 HtmlXPathSelector 对其进行解析？

非常感谢您提前

score 5 · Accepted Answer

import json

response = json.loads(jsonResponse)

上面的代码将解码您收到的 json。之后，您应该能够以任何您想要的方式处理它。

（替换jsonResponse为您从 ajax 请求中获得的 json）

score 0 · Accepted Answer

有点复杂，仍然有效。

如果您有兴趣在 JSON 输出上使用 xpaths..

免责声明：可能不是最佳解决方案。+1 如果有人改进了这种方法。

安装dicttoxml包（推荐pip）

-使用scrapy的传统请求模块下载输出

在蜘蛛中：

from scrapy.selector import XmlXPathSelector
import lxml.etree as etree

request = Request(link, callback=self.parse_resp)
yield request

def parse_resp(self,response):
     json=response.body
     #Now load the contents using python's JSON module
     json_dict = json.loads(json)
     #transform the contents into xml using dicttoxml
     xml = dicttoxml.dicttoxml(json_dict)
     xml = etree.fromstring(xml)
     #Apply scrapy's XmlXPathSelector module,and start using xpaths
     xml = XmlXPathSelector(text=xml)
     data = xml.select(".//*[@id='count']/text()").extract()
     return data

我这样做是因为，我将所有蜘蛛的所有 xpath 维护在一个地方（配置文件）

python - Scrapy 解析 JSON 输出

2 回答 2

Related

Reference