我试图从同一页面上的“主表”内的不同“表”中提取数据(相同的 URL)。项目字段在所有子表中具有相同的 XPath/相同的结构,所以我面临的问题只是为此页面上的表部分添加“多个”XPath。
这是我的代码的样子:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from tutorial.items import TutorialItem
class MySpider(BaseSpider):
name = "test"
allowed_domains = ["blabla.com"]
start_urls = ["http://www.blablabl..com"] // Start_url Doesnt change = Same Page
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = [hxs.select('//tr[@class="index class_tr group-6487"]')]
//Here I would like to have Mltiple XPathSelectors ex:
// titles = [hxs.select('//tr[@class="index class_tr group-6488"]')]
// titles = [hxs.select('//tr[@class="index class_tr group-6489"]')]
// Each for a table section within the same 'Main Table'
items = []
for title in titles:
item = TutorialItem()
item ['name'] = title.select('td[3]/span/a/text()').extract()
item ['encryption'] = title.select('td[5]/text()').extract()
item ['compression'] = title.select('td[8]/text()').extract()
item ['resolution'] = title.select('td[7]/span/text()').extract()
items.append(item)
return items
如果可以做到这一点,我将不胜感激;如果我为每个表部分编写不同的蜘蛛,那么我最终会为同一个 URL/表创建 10 个蜘蛛,我不太确定是否可以按顺序在同一个“csv”文件中检索数据。