我正在研究scrapy,我正在尝试从网站收集一些数据,
蜘蛛代码
class NaaptolSpider(BaseSpider):
name = "naaptol"
domain_name = "www.naaptol.com"
start_urls = ["http://www.naaptol.com/buy/mobile_phones/mobile_handsets.html"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
cell_matter = hxs.select('//div[@class="gridInfo"]/div[@class="gridProduct gridProduct_special"]')
items=[]
for i in cell_matter:
cell_names = i.select('//p[@class="proName"]/a/text()').extract()
prices = i.select('//p[@class="values"]/strong/text()').extract()
item = ExampleItem()
item['cell_name'] = cell_names
item['price'] = prices
items.append(item)
return [FormRequest(url="http://www.naaptol.com/faces/jsp/search/searchResults.jsp",
formdata={'type': 'cat_catlg',
'catid': '27',
'sb' : '9,8',
'frm' : '1',
'max' : '15',
'req': 'ajax'
},
callback=self.parse_item
)]
def parse_item(self, response):
hxs = HtmlXPathSelector(response)
cell_matter = hxs.select('//div[@class="gridInfo"]/div[@class="gridProduct gridProduct_special"]')
for i in cell_matter:
cell_names = i.select('//p[@class="proName"]/a/text()').extract()
prices = i.select('//p[@class="values"]/strong/text()').extract()
print cell_names
print prices
结果:
2012-06-15 09:38:36+0530 [naaptol] DEBUG: Crawled (200) <POST http://www.naaptol.com/faces/jsp/search/searchResults.jsp> (referer: http://www.naaptol.com/buy/mobile_phones/mobile_handsets.html)
[]
[]
实际上我已经发布了表单来实现在 javascript 中的分页
在这里,我在 parse_item 方法中接收来自 parse 方法的响应,但是当我使用与 parse 方法中相同的 xpath 时,它返回一个空列表,如上,谁能告诉我为什么它返回一个空数组,以及我的代码有什么问题。
提前致谢