我有一个scrapy spider (code at this gist ),它似乎运行良好,除了它突然无缘无故停止的事实。当它停止时,日志文件的最后一位是:
2012-12-28 23:42:04+0000 [church] DEBUG: Crawled (200) <GET http://www.achurchnearyou.com/cogges-st-mary/> (referer: http://www.achurchnearyou.com/clifton-reynes-st-mary-the-virgin/)
2012-12-28 23:42:04+0000 [church] DEBUG: Scraped from <200 http://www.achurchnearyou.com/cogges-st-mary/>
{'archdeaconry': u'OXFORD',
'archdeaconry_id': u'271',
'benefice': u'Cogges and S Leigh',
'benefice_id': u'27',
'deanery': u'WITNEY',
'deanery_id': u'27109',
'legal_name': u'Cogges',
'parish_id': u'270245'}
2012-12-28 23:42:04+0000 [church] DEBUG: Redirecting (301) to <GET http://www.achurchnearyou.com//> from <GET http://www.achurchnearyou.com/venue.php?V=0083>
2012-12-28 23:42:04+0000 [church] INFO: Closing spider (finished)
蜘蛛是否有任何理由可能会在重定向 URL 后直接完成它?有趣的是,我有一些自定义的 DownloaderMiddleware 会捕获这样的重定向并创建一个新请求(基本上我正在尝试的一些 URL 将重定向到主页,我想忽略这些并创建一个不同的 URL )。