我在使用 Scrapy 时遇到问题,由于某种原因它没有进入我的解析方法,我不知道为什么会这样。我尝试了不同的选择但没有成功。
这就是我的代码现在的样子。具体来说,有两条打印语句,而 parse() 方法中的一条没有被调用。
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy import log
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from comments.items import CustomerReview
import re
class AppidSpider(BaseSpider):
name = "appid"
allowed_domains = ["itunes.apple.com"]
start_urls = [
"http://itunes.apple.com/us/genre/ios/id36?mt=8"
]
rules = [Rule(SgmlLinkExtractor(), follow=True, callback='parse')]
print "---> THIS IS TEST 1"
def parse(self, response):
print " ----> THIS IS TEST 2"
# ... More code afterwards
这就是输出。如您所见, TEST 2 从未打印过。
$ scrapy crawl appid
2012-07-05 13:41:02+0000 [scrapy] INFO: Scrapy 0.14.4 started (bot: comments)
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState
---> THIS IS TEST 1
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled item pipelines: FilterWordsPipeline
2012-07-05 13:41:02+0000 [appid] INFO: Spider opened
2012-07-05 13:41:02+0000 [appid] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2012-07-05 13:41:02+0000 [appid] DEBUG: Crawled (200) <GET http://itunes.apple.com/us/genre/ios/id36?mt=8> (referer: None)
2012-07-05 13:41:02+0000 [appid] INFO: Closing spider (finished)
2012-07-05 13:41:02+0000 [appid] INFO: Dumping spider stats:
{'downloader/request_bytes': 222,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 9927,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2012, 7, 5, 13, 41, 2, 694678),
'scheduler/memory_enqueued': 1,
'start_time': datetime.datetime(2012, 7, 5, 13, 41, 2, 604025)}
2012-07-05 13:41:02+0000 [appid] INFO: Spider closed (finished)
2012-07-05 13:41:02+0000 [scrapy] INFO: Dumping global stats:
{'memusage/max': 95318016, 'memusage/startup': 95318016}