我正在尝试抓取多个 URL,但由于某种原因,只有 1 个站点显示的结果。在每种情况下,它都是显示的 start_urls 中的最后一个 URL。
我相信我的问题已缩小到我的解析函数。
关于我做错了什么的任何想法?
谢谢!
class HeatSpider(scrapy.Spider):
name = "heat"
start_urls = ['https://www.expedia.com/Hotel-Search?#&destination=new+york&startDate=11/15/2016&endDate=11/16/2016®ionId=&adults=2', 'https://www.expedia.com/Hotel-Search?#&destination=dallas&startDate=11/15/2016&endDate=11/16/2016®ionId=&adults=2']
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url, self.parse,
endpoint='render.html',
args={'wait': 8},
)
def parse(self, response):
for metric in response.css('.matrix-data'):
yield {
'City': response.css('title::text').extract_first(),
'Metric Data Title': metric.css('.title::text').extract_first(),
'Metric Data Price': metric.css('.price::text').extract_first(),
}
编辑:
我已经更改了我的代码以帮助调试。运行此代码后,我的 csv 如下所示:csv results 每个 url 都有一行,应该有,但只有一行填写了信息。
class HeatSpider(scrapy.Spider):
name = "heat"
start_urls = ['https://www.expedia.com/Hotel-Search?#&destination=new+york&startDate=11/15/2016&endDate=11/16/2016®ionId=&adults=2', 'https://www.expedia.com/Hotel-Search?#&destination=dallas&startDate=11/15/2016&endDate=11/16/2016®ionId=&adults=2']
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url, self.parse,
endpoint='render.html',
args={'wait': 8},
)
def parse(self, response):
yield {
'City': response.css('title::text').extract_first(),
'Metric Data Title': response.css('.matrix-data .title::text').extract(),
'Metric Data Price': response.css('.matrix-data .price::text').extract(),
'url': response.url,
}
编辑 2:这是完整的输出http://pastebin.com/cLM3T05P 在第 46 行,您可以看到空单元格