python - 在 scrapy 中抓取 json 响应

Question

我一直在学习用scrapy刮网页。提供给我的一些数据是 JSON 格式的，到目前为止，我一直无法成功抓取 JSON 页面。我知道它可以完成（感谢我之前唯一的问题和有用的回复），但我无法让它发挥作用。我想知道a）是否有人知道JSON成功的scrapy脚本示例，或者b）我可以请一些指点。

我一直在使用此页面寻求帮助： http ://www.jroller.com/evans/entry/parsing_json_with_python ，试图抓取用作示例的页面。

我的蜘蛛跑了，但没有刮掉任何东西。我知道我犯了错误，但我觉得我至少改变了 spyder 的每一个小方面，现在只是让自己感到困惑。

我的蜘蛛的基础（根据下面的建议进行了编辑）是这样的：

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from learnjson.items import learnjsonitems, Field
import json
import urllib2

class MySpider(BaseSpider):
name = "jsonexample"
allowed_domains = ["googleapis.com"]
req = urllib2.urlopen('http://maps.googleapis.com/maps/api/geocode/json?address=8-10%20Broadway,%20London%20SW1H%200BG,%20United%20Kingdom&sensor=false'
)

def json_parse(self, response):
    jsonresponse = json.loads(response.body_as_unicode())
    latitude = jsonresponse["lat"]


    print item["lat"]

以页面被抓取的例子（不是我的数据，只是用来练习的东西）为例，我一直在尝试提取街道地址和纬度/经度，但我尝试过的任何方法似乎都不起作用。

score 1 · Accepted Answer

我认为您错过了导入 json.Add

在您的代码中导入 json。

还可以使用 urllib2 提取并打开您的 json 文件。它可以正常工作。你可以附加一个json响应的代码，比如。

class MySpider(BaseSpider):
    ...   

    def parse(self, response):
         jsonresponse = json.loads(response)

         item = MyItem()
         item["firstName"] = jsonresponse["firstName"]             

         return item

希望这可以帮助：）

python - 在 scrapy 中抓取 json 响应

1 回答 1

Related

Reference