由于我的请求正在通过多个功能,因此我在通过管道发送我的项目时遇到了一些问题。
我只是希望有任何手动方式将项目对象发送到scrapy管道。因为我不知道scrapy的内部细节。
假设我有调用的函数
def parseDetails(self, response):
item = DmozItem()
item['test'] = "mytest"
sendToPiepline(piplineName , item)
def parseDetails(self, response):
item = DmozItem()
item['test'] = "mytest"
# Call pipeline.
itemproc = self.crawler.engine.scraper.itemproc
itemproc.process_item(item, self)
return item
如果您直接委托给ItemPipelineManager
,您将在管理器中引发未处理的异常:
[2018-07-21 20:00:02] CRITICAL - Unhandled error in Deferred:
[2018-07-21 20:00:02] CRITICAL -
Traceback (most recent call last):
File "/home/vagrant/.local/share/virtualenvs/vagrant-gKDsaKU3/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/vagrant/monitor/pipelines/filter.py", line 24, in process_item
raise DropItem()
scrapy.exceptions.DropItem
这也可能无意中改变管道的状态并影响处理。
我认为更好的方法是获取Pipeline
您要查找的实例,然后直接调用它:
try:
# Manually call the filter
f = utils.get_pipeline_instance(self, FilterPipeline)
f.process_item(p, self)
except DropItem:
pass
使用辅助函数:
def get_pipeline_instance(spider, pipeline_class):
manager = spider.crawler.engine.scraper.itemproc
for pipe in manager.middlewares:
if isinstance(pipe, pipeline_class):
return pipe
else:
raise NotConfigured('Invalid pipeline')