0

我正在尝试以与仅使用scrapy 完全相同的方式使用scrapy splash 登录。我查看了文档Doc,它说“SplashFormRequest.from_response 也受支持,并且按照scrapy文档中的描述工作”但是,简单地更改一行代码并按照启动文档中的描述更改设置不会带来任何结果。我做错了什么?代码:

import scrapy
from scrapy_splash import SplashRequest

class MySpider(scrapy.Spider):
    name = 'lost'
    start_urls = ["myurl",]

def parse(self, response):
    return SplashFormRequest.from_response(
        response,
        formdata={'username': 'pass', 'password': 'pass'},
        callback=self.after_login
    ) 

def after_login(self, response):
    print response.body
    if "keyword" in response.body:
        self.logger.error("Success")
    else:
        self.logger.error("Failed")

添加到设置:

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware':     810,
                           }

SPLASH_URL = 'http://localhost:8050'
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

错误日志:

python@debian:~/Python/code/lostfilm$ scrapy crawl lost
2017-01-26 20:24:22 [scrapy.utils.log] INFO: Scrapy 1.3.0 started (bot:   lostfilm)
2017-01-26 20:24:22 [scrapy.utils.log] INFO: Overridden settings:   {'NEWSPIDER_MODULE': 'lostfilm.spiders', 'ROBOTSTXT_OBEY': True,  'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', 'SPIDER_MODULES': ['lostfilm.spiders'], 'BOT_NAME': 'lostfilm', 'HTTPCACHE_STORAGE':   'scrapy_splash.SplashAwareFSCacheStorage'}
2017-01-26 20:24:22 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
Unhandled error in Deferred:
2017-01-26 20:24:22 [twisted] CRITICAL: Unhandled error in Deferred:

2017-01-26 20:24:22 [twisted] CRITICAL: 
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",  line 1299, in _inlineCallbacks
  result = g.send(result)
 File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line  90, in crawl
six.reraise(*exc_info)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 69, in __init__
self.downloader = downloader_cls(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/__init__.py", line 88, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 49, in load_object
raise NameError("Module '%s' doesn't define any object named '%s'" %  (module, name))
NameError: Module 'scrapy.downloadermiddlewares.httpcompression' doesn't  define any object named 'HttpCompresionMiddlerware'
4

1 回答 1

0

您可能还需要使用 Splash 执行第一个请求。

默认情况下,该start_urls属性将发出 "simple" scrapy.Request,而不是SplashRequest.

您需要start_requests为您的蜘蛛覆盖方法:

class MySpider(scrapy.Spider):
    name = 'lost'
    start_urls = ["myurl",]

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url)
    ...
于 2017-01-26T10:01:42.757 回答