CrawlSpider 规则不是这样工作的。您可能需要继承 BaseSpider 并在您的蜘蛛回调中实现您自己的链接提取。例如:
from scrapy.spider import BaseSpider
from scrapy.http import Request
from scrapy.selector import XmlXPathSelector
class MySpider(BaseSpider):
name = 'myspider'
def parse(self, response):
xxs = XmlXPathSelector(response)
links = xxs.select("//link/text()").extract()
return [Request(x, callback=self.parse_link) for x in links]
您还可以在 shell 中尝试 XPath,例如运行:
scrapy shell http://blog.scrapy.org/rss.xml
然后在 shell 中输入:
>>> xxs.select("//link/text()").extract()
[u'http://blog.scrapy.org',
u'http://blog.scrapy.org/new-bugfix-release-0101',
u'http://blog.scrapy.org/new-scrapy-blog-and-scrapy-010-release']