2

由于我的请求正在通过多个功能,因此我在通过管道发送我的项目时遇到了一些问题。

我只是希望有任何手动方式将项目对象发送到scrapy管道。因为我不知道scrapy的内部细节。

假设我有调用的函数

def parseDetails(self, response):

  item = DmozItem()
  item['test'] = "mytest"

  sendToPiepline(piplineName , item)
4

2 回答 2

2

scrapy/commands/parse.py

def parseDetails(self, response):
  item = DmozItem()
  item['test'] = "mytest"

  # Call pipeline.
  itemproc = self.crawler.engine.scraper.itemproc
  itemproc.process_item(item, self)

  return item
于 2013-04-17T13:04:29.863 回答
0

如果您直接委托给ItemPipelineManager,您将在管理器中引发未处理的异常:

[2018-07-21 20:00:02] CRITICAL - Unhandled error in Deferred:

[2018-07-21 20:00:02] CRITICAL -
Traceback (most recent call last):
  File "/home/vagrant/.local/share/virtualenvs/vagrant-gKDsaKU3/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/vagrant/monitor/pipelines/filter.py", line 24, in process_item
    raise DropItem()
scrapy.exceptions.DropItem

这也可能无意中改变管道的状态并影响处理。

我认为更好的方法是获取Pipeline您要查找的实例,然后直接调用它:

try:
    # Manually call the filter
    f = utils.get_pipeline_instance(self, FilterPipeline)
    f.process_item(p, self)
except DropItem:
    pass

使用辅助函数:

def get_pipeline_instance(spider, pipeline_class):
    manager = spider.crawler.engine.scraper.itemproc
    for pipe in manager.middlewares:
        if isinstance(pipe, pipeline_class):
            return pipe
    else:
        raise NotConfigured('Invalid pipeline')
于 2018-07-21T20:22:47.147 回答