0

我正在尝试使用 scrapy 为 m-ati.su 编写解析器。第一步,我必须从组合框中获取不同城市名称为“From”和“To”的值和文本字段。我看着萤火虫的请求并写道

class spider(BaseSpider):
    name = 'ati_su'
    start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load']
    allowed_domains = ["m-ati.su"]

    def parse(self, response):
        yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList', 
                        callback=self.ati_from, 
                        formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'})
    def ati_from(self, response):
        json = response.body
        open('results.txt', 'wb').write(json)

对于这个请求,我有“500 内部服务器错误”。我做错了什么?抱歉英语不好。谢谢

4

1 回答 1

0

我认为您可能必须在X-Requested-With: XMLHttpRequestPOST 请求中添加标头,因此您可以尝试以下操作:

    def parse(self, response):
        yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList', 
                          callback=self.ati_from, 
                          formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'},
                          headers={"X-Requested-With": "XMLHttpRequest"})

编辑:我尝试运行蜘蛛并附带以下内容:

(当我使用 Firefox 检查请求正文时,它是 JSON 编码的,所以我使用Request并强制“POST”方法,并且我得到的响应在“windows-1251”中结束)

from scrapy.spider import BaseSpider
from scrapy.http import Request
import json

class spider(BaseSpider):
    name = 'ati_su'
    start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load']
    allowed_domains = ["m-ati.su"]

    def parse(self, response):
        yield Request('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList',
                      callback=self.ati_from,
                      method="POST",
                      body=json.dumps({
                            'prefixText': 'moscow',
                            'count': '10',
                            'contextKey':'All_0$Rus'
                      }),
                      headers={
                            "X-Requested-With": "XMLHttpRequest",
                            "Accept": "application/json, text/javascript, */*; q=0.01",
                            "Content-Type": "application/json; charset=utf-8",
                            "Pragma": "no-cache",
                            "Cache-Control": "no-cache",
                      })
    def ati_from(self, response):
        jsondata = response.body
        print json.loads(jsondata, encoding="windows-1251")
于 2014-01-18T23:49:02.330 回答