0

这是我的代码:

from scrapy import * 
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector 
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor

class lala(CrawlSpider):
    name="lala"
    start_url=["http://www.lala.net/"]       
    rule = [Rule(SgmlLinkExtractor(), follow=True, callback='self.parse')] 

    def __init__(self):
        super(lala, self).__init__(self)    
        print "\nworking\n"

    def parse(self,response):        
        print "\n\n Middle \n"  

print "\nend\n"

问题是:

UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST

请注意,在这种情况下,两者都会endworking打印出来。

如果我删除了 init,则没有错误,但由于未打印中间 msg,因此未调用解析。

4

2 回答 2

2

调用继承的方法不需要传入:self__init__()super()

def __init__(self):
    super(lala, self).__init__()    

查看文档中列出的示例,应该调用该属性rules,而不是rule

class lala(CrawlSpider):
    name="lala"
    start_url=["http://www.lala.net/"]       
    rules = [
        Rule(SgmlLinkExtractor(), follow=True, callback='self.parse')
    ] 
于 2013-04-09T13:20:01.017 回答
1

scrapy 文档明确警告不要使用 CrawlSpider 并覆盖 parse 方法

尝试将您的parse方法重命名为类似的名称parse_item,然后重试。

于 2013-04-09T16:39:55.320 回答