python - JSON 响应和 Scrapy

Question

我正在尝试使用 Scrapy 将来自纽约时报 API 的 JSON 响应解析为 CSV，以便我可以对特定查询的所有相关文章进行摘要。我想将其作为 CSV 文件吐出，其中包含链接、发布日期、摘要和标题，以便我可以对摘要描述进行一些关键字搜索。我是 Python 和 Scrapy 的新手，但这是我的蜘蛛（我收到 HTTP 400 错误）。我已经在蜘蛛中删除了我的 api 密钥：

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from nytimesAPIjson.items import NytimesapijsonItem
import json
import urllib2

class MySpider(BaseSpider):
    name = "nytimesapijson"
    allowed_domains = ["http://api.nytimes.com/svc/search/v2/articlesearch"]
    req = urllib2.urlopen('http://api.nytimes.com/svc/search/v2/articlesearch.json?q="financial crime"&facet_field=day_of_week&begin_date=20130101&end_date=20130916&page=2&rank=newest&api-key=xxx)

      def json_parse(self, response):
          jsonresponse= json.loads(response)

          item = NytimesapijsonItem()
          item ["pubDate"] = jsonresponse["pub_date"]
          item ["description"] = jsonresponse["lead_paragraph"]
          item ["title"] = jsonresponse["print_headline"]
          item ["link"] = jsonresponse["web_url"]
          items.append(item)
          return items

如果有人有任何想法/建议，包括 Scrapy 之外的想法/建议，请告诉我。提前致谢。

score 2 · Accepted Answer

您应该设置start_urls和使用parse方法：

from scrapy.spider import BaseSpider
import json


class MySpider(BaseSpider):
    name = "nytimesapijson"
    allowed_domains = ["api.nytimes.com"]
    start_urls = ['http://api.nytimes.com/svc/search/v2/articlesearch.json?q="financial crime"&facet_field=day_of_week&begin_date=20130101&end_date=20130916&page=2&rank=newest&api-key=xxx']

    def parse(self, response):
        jsonresponse = json.loads(response)

        print jsonresponse

python - JSON 响应和 Scrapy

1 回答 1

Related

Reference