如果 Scrapy 因异常而失败,建议采取什么措施:
OSError:[Errno 28] 设备上没有剩余空间
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 53, in process_response
spider=spider)
File "/usr/lib/python3.6/site-packages/scrapy/downloadermiddlewares/httpcache.py", line 86, in process_response
self._cache_response(spider, response, request, cachedresponse)
File "/usr/lib/python3.6/site-packages/scrapy/downloadermiddlewares/httpcache.py", line 106, in _cache_response
self.storage.store_response(spider, request, response)
File "/usr/lib/python3.6/site-packages/scrapy/extensions/httpcache.py", line 317, in store_response
f.write(to_bytes(repr(metadata)))
OSError: [Errno 28] No space left on device
在这种特定情况下,使用限制为 128 MB 的 ramdisk/tmpfs 作为缓存磁盘,在 httpcache.FilesystemCacheStorage 上设置HTTPCACHE_EXPIRATION_SECS = 300
HTTPCACHE_ENABLED = True
HTTPCACHE_EXPIRATION_SECS = 300
HTTPCACHE_DIR = '/tmp/ramdisk/scrapycache' # (tmpfs on /tmp/ramdisk type tmpfs (rw,relatime,size=131072k))
HTTPCACHE_IGNORE_HTTP_CODES = ['400','401','403','404','500','504']
HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
我可能错了,但我觉得 Scrapy 的FilesystemCacheStorage可能无法很好地管理它的缓存(存储限制)(?)。
使用 LevelDB 会更好吗?